Finish Slime #8

Table formats, roadmaps, airflow, mage, kestra & data entropy.

Data Engineering, Analytics. No ML, no AI. The weekly dose of the data content you actually want to read!


Data teams can be either reactive or proactive. Reactive teams fulfill requests from other departments, while proactive teams use data to identify new opportunities and drive growth. In today's data-driven world, proactive data teams are more valuable than ever.

AWS re:Invent is a yearly conference for cloud computing professionals. At the 2022 conference, AWS announced several new data-related features and services. These features include new machine learning and analytics services and new tools for managing data lakes and warehouses.

In the past, data teams were often focused on supporting existing business operations. However, in today's data-driven world, data teams are increasingly asked to be more active in driving growth. This means that data teams need to understand the business's needs and use data to identify new opportunities.

In the past, data teams often worked in silos, providing ad-hoc reports and analyses to business users. However, this approach is no longer sustainable in today's data-driven world. Data teams must be integrated into the product development process, working closely with product managers and engineers to ensure data is used to inform product decisions.

As the amount of data we collect and store continues to grow, so does the problem of data entropy. Data entropy is the tendency for data to become less valuable over time. This can happen for several reasons, including data corruption, data loss, and changes in the data schema. Data entropy can have a significant impact on the ability of businesses to make informed decisions.


Data engineering is constantly evolving, and the landscape is expected to change even more in 2023. Some of the key trends that are expected to shape the field of data engineering in 2023 include: The rise of cloud-based data engineering platforms, the increasing use of artificial intelligence and machine learning in data engineering, and the need for data engineers to have a deep understanding of data quality and data governance

When choosing an open table format for your transactional data lake on AWS, there are a few factors to consider. These factors include the size of your data, the type of analysis you will be doing, and the tools you will use to access your data. Some popular open table formats include Parquet, ORC, and Delta Lake. Parquet is a good choice for large data sets that will be analyzed using SQL. ORC is a good choice for data sets that Spark will analyze. Delta Lake is a good choice for data sets that need to be updated frequently.

How did you like this edition?

Login or Subscribe to participate in polls.