Finish Slime #16

Hot/Cold data, dbt lineage, stakeholder management, data mesh, dbt at scale, airflow.

Sven Balnojan
August 09, 2023

Data Engineering, Analytics. No ML, no AI. The weekly dose of the data content you actually want to read!

Want to share anything with me? Hit me up on Twitter @sbalnojan or Linkedin.

The Next Big Crisis for Data Teams

The first big crisis for data teams was technology. The next big one is getting closer to business. You can handle it by walking in your consumer's shoes, getting to know the people, and creating feedback loops.

Barr Moses | 25 Apr 2023

Hot-Cold Data Separation: What, Why, and How?

Hot data is data you’re frequently using, while cold data isn’t. By separating cold and hot data in terms of storage; you can save 20-80% of your cost. However, you need a querying framework like Apache Doris to reconcile these two storage options.

ApacheDoris | 2 Jul 2023

dbt Lineage Diff — Impact analysis, visualized

Dbt Lineage Diff is a fun tool from PipeRider (PipeRider has a free edition for individual developers). It allows you to see a diff in the lineage before committing a change; thereby, you can see how many models a change affects.

Dave Flynn | 28 Jun 2023

How Poor Stakeholder Management Ruins Analytics

Robert argues data teams need to take ownership of how data is used and understand how decision-making happens in your company. Otherwise, you won’t be able to drive value with the data you deliver.

Robert Yi | 28 Jun 2023

Three More Reasons Why Data Mesh Will Fail

Data Meshes are hot, but they might fail for your organization. The author already published a prequel with six possible reasons; this follow-up is another good checklist of things to remember. People do stay the same, Data Meshes are a lure, and it will continue to cost serious money.

Hannes R5 | 27 Jun 2023

Building dbt CI/CD at scale

Checkout runs dbt at a serious scale with 27+ projects and 100+ dbt developers. This article describes their CI/CD processes essential at such a scale.

Damien Siemieniak | 25 Apr 2023

From analytics to actual application: the case of Customer Lifetime Value

The CLV is an important metric, and every data engineer & analyst should know the basics of it. This tutorial covers those.

Katherine Munro | 2 Jul 2023

Running Airflow at Scale: Empowering Data Platform at KOHO

KOHO runs 30k DAGs a day powered by AWS-managed Airflow; They use a couple of techniques to make Airflow scale well, including good infrastructure sizing, dynamic DAG generation, distributing the scheduling and workload, and more.

Manikandan Paramasivan | 5 Jul 2023

Subscribe to keep reading

This content is free, but you must be subscribed to The Finish Slime to continue reading.

Already a subscriber?Sign in.Not now