Eat your vegetables 🥦: please, please document your data stack 🥞

With Prequel.ai, adding context to everything in your data stack is a breeze 💨

Joseph Moon
3 min readJun 12, 2021

The science lab analogy

Imagine a science lab that has a bunch of test tubes with all sorts of technically sophisticated biological experiments (I’m talkin’ CRISPR, immuno-oncology, mRNA therapeutics). Now imagine that none of the test tubes have any labels, so you don’t know the context of any of the experiments.

Does this sound crazy? Good, because it is. And this is the state of the modern data stack today (cloud-native data warehouses as the central source of truth for data applications). There is no documentation to be seen for tables, columns, and queries in your data warehouse. The method du jour is frantically searching Google Sheets, Notion, or slacking your co-workers at 2AM before an important analysis is due. There are deprecated and duplicated tables everywhere, and a half dozen telemetry sources with different schema conventions dumping into a vast sea of half-usable data. And that metric you used for that important dashboard probably isn’t consistent with the current ad-hoc analysis you are doing in SQL and you don’t know why.

Good documentation = good hygiene

But writing documentation is boring. It’s like eating your vegetables. You know it’s good for you, but you just don’t want to do it. Version-controlling your code is also boring, but humans were able to get ahead of our own laziness by adopting tools to lower the activation energy to keep things nice and tidy. The git-github paradigm brought us code hygiene necessary to build amazing software products in a collaborative fashion without having to email back and forth code and manually fix synchronization issues. An amazing feat of engineering. Similarly, data workers need a low-friction, low activation energy way to document the data stack and add context to their work.

The data warehouse-native documentation system

Combine the two trends — cloud data warehouse and the need for documentation — and you come to the conclusion: we need a data warehouse-native documentation system. And to enable users to be lazy, we need automation and a copious amount of integrations that make it almost trivial to write good documentation. And now the question is: how?

Extract and synchronize metadata automatically. Connect to Snowflake, BigQuery, and Redshift and automatically milk metadata insights: schema information, query logs, version history of tables, lineage, and more.

The Data Discovery feature automatically scrapes relevant metadata and builds a context hub for your data and queries.

Enable SQL blocks inside the note-taking space. A la Jupyter notebooks, but for SQL. You want to write documentation on queries and datasets, but keep the context live and fresh. A notebook format for documenting SQL and queries is also the perfect way to build a knowledge base for your team or to share one-off insights (a la Notion).

The QueryDoc interface enabling SQL blocks inside a note-taking workspace.

The future is already here

Imagine a future world in which you can open up an app, hit Command+K, and start searching immediately for that critical context about the tables you need to perform that analysis your CEO asked you to do prior to his/her board meeting. Being able to search for all the queries written against that table, search for all the documentation that’s been written for the table, and being able to see who is a frequent user of that table and snooping in on their queries and notes. That future is already here in Prequel.ai. Sign up today at prequel.ai.

P.S. We have a few slots left in our Prequel 100 Lifetime Membership promo. For $100, you can get lifetime access to our software. Click here to learn more.

--

--

Joseph Moon

Data Scientist, Entrepreneur, Investor. Harvard & MIT. LinkedIn.com/in/yosupmoon @josephmoon_ai