The Finish Slime

The Finish Slime
Pages
Dbt Alternatives - Modern Data Transformation Tools

Your data team wants to start using a data transformation tool or is looking for a dbt alternative? Great, then you’re in the right place.

Dbt led a small revolution inside the data transformation world, pushing forward ELT as a pattern and enabling thousands of analysts to turn into analytics engineers.

The analytics engineering revolution makes dbt seem huge and the default choice for transforming data. But that’s a fallacy.

There always have been good alternatives to dbt even before, just with a slightly different target audience.
Multiple great tools grew in the shadows of dbt that are now ready for prime time.

So today, we have alternatives to dbt that suit your workflow better. We have alternatives for every type of data hero.

Data transformation vs. everything else

TL;DR: What you need is a data transformation AND processing tool. (And dbt does both!)

Let’s get an understanding of the task at hand first. Dbt in the past propagated this picture:

But as dbt already realized, the correct picture for what dbt is used for looks like this:

It turns raw data into more. Not just inside a data warehouse (there are dbt => data lake adaptors via spark), not just “renaming & casting”. It’s the complete processing of data.

What dbt does, and what we all need, is a way of doing data transformation AND processing. If you look into the Unified Data Infrastructure below, we’re trying to find a solution for data processing and transformation without necessarily a querying engine.

The two great things about Dbt

TL;DR: Your tool should have versioning support AND either SQL or Python support.

Dbt introduces two important, while small, innovations to the data workflow of many people:

templated (!) SQL
Versionable transformation + processing code

Templated SQL is great because SQL has a shallow barrier to entry, making dbt accessible even to analysts. Dbt has since expanded to include Python models. “Templated” means you can do slightly more than with regular SQL, including having reusable code and basic control patterns.

Versionable code is excellent for audibility, but the essential parts are the consequences of versionable code. Once your data transformation code is versionable, it becomes auditable, testable, deployable to different environments, and many other things.

Dbt alternatives

This isn’t just a plain old list of random tools. This list includes only tools that have the following:

SQL support or Python support
Allow for versioned code

Here they are

#1 SQLMesh

Website: https://sqlmesh.com/
Free/OS: yes
SQL support: yes, templated
Python support: yes
Versionable code: yes
Focus: SQLMesh focuses more on data engineers and providing them with a great development environment; it makes versioning & testing a first-class citizen.
Comment from us: New and completely underrated tool!

#2 Databricks

Website: https://www.databricks.com/
Free/OS: No (free trial available)
SQL support: yes, templated
Python support: yes
Versionable code: yes
Focus: Databricks has a unified platform making it great for whole departments to use.
Comment from us: The databricks focus has previously been mostly on data scientists, MLers and data engineers with heavy technical skills. But that changed. Today, databricks has everything and more (including a dashboarding tool out of the box).

#3 Datameer (Snowflake only)

Website: https://www.datameer.com/
Free/OS: No
SQL support: yes, no template support
Python support: No
Versionable code: yes (inside the tool)
Focus: Both on less technical users (offers a GUI) and SQL-heavy users. Snowflake only

#4 DIY Python

Website: None
Free/OS: yes
SQL support: (yes) see #5
Python support: yes
Versionable code: yes!
Focus: Experienced Python engineers.
Comment from us: The benefits of DIY Python are all inside using the full power of a proper programming language. It means writing and reusing modules, testing, and versioning. This can be deployed using orchestrators like Apache Airflow, but it doesn’t have to. Sometimes cron will do the job just fine.

#5 SQL Heavy: Parametrized SQL Statements

Website: None
Free/OS: yes
SQL support: see
Python support: see #4
Versionable code: yes!
Focus: Experienced SQLers engineers.
Comment from us: Parametrized SQL statements usually are wrapped into Python code. They are either hand-built or use Jinja, just like dbt does. The benefits are in having reusable code, testing small snippets like a CTE, and the like. This can be deployed using orchestrators like Apache Airflow but doesn’t have to. Sometimes cron will do the job just fine.

#6 Notebooks/ Apache Spark/Pandas

Website: https://jupyter.org/
Free/OS: yes (depends on deployment & orchestration, paid options available)
SQL support: yes
Python support: yes
Versionable code: yes

#7 AWS Glue (or GCP, or Azure alt.)

Website: https://docs.aws.amazon.com/glue/latest/dg/what-is-glue.html
Free/OS: no
SQL support: no
Python support: yes (& Scala)
Versionable code: yes

#8 Keboola

Website: h ttps://www.keboola.com /
Free/OS: no
SQL support: yes (incl. dbt runner)
Python support: yes (& R, Julia)
Versionable code: yes
Focus: Complete package for E(t)LT, including orchestration.

#9 SDF

Website: h ttps://www.sdf.com /
Free/OS: no (still in beta?)
SQL support: yes
Python support: no
Versionable code: yes
Focus: SQL development for enterprises, good comparison article is this one: “dbt vs sdf vs SQLMesh”.

#10 Datacoves

Website: https://datacoves.com/
Free/OS: No (Contact for a free trial)
SQL support: yes
Python support: yes
Versionable code: yes
Focus: The Datacoves platform helps enterprises overcome their data delivery challenges quickly using dbt and Airflow, implementing best practices from the start without the need for multiple vendors or costly consultants.

Want to add your dbt alternative? Just fill out the basic info and ping me on Twitter/ LinkedIn.