Finish Slime #16

Hot/Cold data, dbt lineage, stakeholder management, data mesh, dbt at scale, airflow.

Data Engineering, Analytics. No ML, no AI. The weekly dose of the data content you actually want to read!

Want to share anything with me? Hit me up on Twitter @sbalnojan or Linkedin.

The first big crisis for data teams was technology. The next big one is getting closer to business. You can handle it by walking in your consumer's shoes, getting to know the people, and creating feedback loops.

Barr Moses | 25 Apr 2023

Hot data is data you’re frequently using, while cold data isn’t. By separating cold and hot data in terms of storage; you can save 20-80% of your cost. However, you need a querying framework like Apache Doris to reconcile these two storage options.

ApacheDoris | 2 Jul 2023

Dbt Lineage Diff is a fun tool from PipeRider (PipeRider has a free edition for individual developers). It allows you to see a diff in the lineage before committing a change; thereby, you can see how many models a change affects.

Dave Flynn | 28 Jun 2023

Robert argues data teams need to take ownership of how data is used and understand how decision-making happens in your company. Otherwise, you won’t be able to drive value with the data you deliver.

Robert Yi | 28 Jun 2023

Data Meshes are hot, but they might fail for your organization. The author already published a prequel with six possible reasons; this follow-up is another good checklist of things to remember. People do stay the same, Data Meshes are a lure, and it will continue to cost serious money.

Hannes R5 | 27 Jun 2023

Checkout runs dbt at a serious scale with 27+ projects and 100+ dbt developers. This article describes their CI/CD processes essential at such a scale.

Damien Siemieniak | 25 Apr 2023

The CLV is an important metric, and every data engineer & analyst should know the basics of it. This tutorial covers those.

Katherine Munro | 2 Jul 2023

KOHO runs 30k DAGs a day powered by AWS-managed Airflow; They use a couple of techniques to make Airflow scale well, including good infrastructure sizing, dynamic DAG generation, distributing the scheduling and workload, and more.

Manikandan Paramasivan | 5 Jul 2023

Exclusive Gift 🎁

We got an exclusive gift for you! Data Engineering Reading Guide - the best books, articles & resources you must read!

And, of course, our (public)

How did you like this edition?

Login or Subscribe to participate in polls.

Subscribe to keep reading

This content is free, but you must be subscribed to The Finish Slime to continue reading.

Already a subscriber?Sign In.Not now