Finish Slime #35

Polars, Bad Graphs, Unstructured Data

Sven Balnojan
December 20, 2023

Data Engineering, Analytics. No ML, no AI. The weekly dose of the data content you actually want to read!

Want to share anything with me? Hit me up on Twitter @sbalnojan or LinkedIn.

Great Recent Stuff

3 Reasons Your Company Shouldn't Invest Into Data

Sometimes, the best decision is to avoid getting into something. That’s also true for a bunch of data projects; you should think of every single one just as you do of any investment decision.

Friends Don't Let Friends Make Bad Graphs

This is just super fun to read! So dear friend, please don’t make a bad graph.

Wisdom of Unstructured Data: Building Airbnb’s Listing Knowledge from Big Text Data

Handling and getting information out of unstructured data is always a hassle - be sure to skim through the Airbnb approach.

Data Engineering in Retrospect: Key Trends and Patterns of 2023

This is a rare retrospective; the DEW newsletter only does this once a year. The author is in the perfect situation to do so!

Executing Cron Scripts Reliably At Scale

You don’t need complex orchestrators to manage data pipelines. A lot of the time, cron is enough, and here’s how you use cron at scale.

Goodbye Spark. Hello Polars + Delta Lake.

Spark has held a deep grip on the delta lake for almost a decade; maybe Polars can start to free that grip.

SQL is not Designed for Analytics

This is food for thought - Benoit makes a good point on the shortcomings of SQL and the rise of new semantic languages like Malloy.

Subscribe to keep reading

This content is free, but you must be subscribed to The Finish Slime to continue reading.

Already a subscriber?Sign in.Not now