Finish Slime #8

Table formats, roadmaps, airflow, mage, kestra & data entropy.

Sven Balnojan
June 14, 2023

Data Engineering, Analytics. No ML, no AI. The weekly dose of the data content you actually want to read!

Resources

Are You a Data Ticket Taker or Decision Maker?

Data teams can be either reactive or proactive. Reactive teams fulfill requests from other departments, while proactive teams use data to identify new opportunities and drive growth. In today's data-driven world, proactive data teams are more valuable than ever.

The data view from AWS re:Invent

AWS re:Invent is a yearly conference for cloud computing professionals. At the 2022 conference, AWS announced several new data-related features and services. These features include new machine learning and analytics services and new tools for managing data lakes and warehouses.

From Support to Growth Oriented Data Teams (Scaling Your Data Team, Transition #1)

In the past, data teams were often focused on supporting existing business operations. However, in today's data-driven world, data teams are increasingly asked to be more active in driving growth. This means that data teams need to understand the business's needs and use data to identify new opportunities.

Hello Product Data Team, Goodbye Ad-Hoc Work

In the past, data teams often worked in silos, providing ad-hoc reports and analyses to business users. However, this approach is no longer sustainable in today's data-driven world. Data teams must be integrated into the product development process, working closely with product managers and engineers to ensure data is used to inform product decisions.

Data Entropy: More Data, More Problems?

As the amount of data we collect and store continues to grow, so does the problem of data entropy. Data entropy is the tendency for data to become less valuable over time. This can happen for several reasons, including data corruption, data loss, and changes in the data schema. Data entropy can have a significant impact on the ability of businesses to make informed decisions.

Tutorials

Roadmap for Data Engineering 2023

Data engineering is constantly evolving, and the landscape is expected to change even more in 2023. Some of the key trends that are expected to shape the field of data engineering in 2023 include: The rise of cloud-based data engineering platforms, the increasing use of artificial intelligence and machine learning in data engineering, and the need for data engineers to have a deep understanding of data quality and data governance

Choosing an open table format for your transactional data lake on AWS

When choosing an open table format for your transactional data lake on AWS, there are a few factors to consider. These factors include the size of your data, the type of analysis you will be doing, and the tools you will use to access your data. Some popular open table formats include Parquet, ORC, and Delta Lake. Parquet is a good choice for large data sets that will be analyzed using SQL. ORC is a good choice for data sets that Spark will analyze. Delta Lake is a good choice for data sets that need to be updated frequently.