The Ultimate Directory of Apache Iceberg Resources

This article is a comprehensive directory of Apache Iceberg resources, including educational materials, tutorials, and hands-on exercises. Whether you’re a beginner or an experienced data engineer, this guide will help you navigate the world of Apache Iceberg and its applications.

Apache Iceberg?

What is Apache Iceberg?

Apache Iceberg is open-source data lakehouse table format. That means it is a standard for how metadata defining a group of files as a table is stored. This metadata enables the files to be read and written to in the same way as a table in a data warehouses by any tool that supports the standard with the same features and ACID guarantees.

Why Does it Matter?

By operating off tables in a seperate storage layer, you can use all your favorite analytical tools on a single copy of your data.
Reduing the number of copies needed can reduce your compute costs, storage costs and network costs of your overall data platform.
By storing your data in a standard format, it reduces future migration costs when changing tooling or adopting new tools.

Who does Apache Iceberg benefit?

Data Engineers since it means less data movement so less data pipelines to manage.
Data Analysts since it means they can have more immediate access to data since it requires fewer data movements to make available especially when paired with data virtualization available in tools like Dremio which allows for Lakehouse Querying and Federated Querying (Virtualization) on one platform.
Data Scientists cause they can also have more immediate data access when training their AI/ML models.
Data Leaders since they can reduce their overall platform costs making it easier to fund other data initiatives.

Apache Iceberg Directory

Apache Iceberg Documentation

Apache Iceberg Education

Here is a list of resources to help you learn Apache Iceberg:

Apache Iceberg Crash Course: What is a Data Lakehouse and a Table Format?
Free Copy of Apache Iceberg the Definitive Guide
Free Apache Iceberg Crash Course
Iceberg Lakehouse Engineering Video Playlist

Apache Iceberg Hands-on Tutorials

Here is a list of hands-on tutorials that will help you get started with Apache Iceberg:

Hands-on Intro with Apache iceberg
Intro to Apache Iceberg, Nessie and Dremio on your Laptop
JSON/CSV/Parquet to Apache Iceberg to BI Dashboard
From MongoDB to Apache Iceberg to BI Dashboard
From SQLServer to Apache Iceberg to BI Dashboard
From Postgres to Apache Iceberg to BI Dashboard
Mongo/Postgres to Apache Iceberg to BI Dashboard using Git for Data and DBT
Elasticsearch to Apache Iceberg to BI Dashboard
MySQL to Apache Iceberg to BI Dashboard
Apache Druid to Apache Iceberg to BI Dashboard
BI Dashboards with Apache Iceberg Using AWS Glue and Apache Superset
End-to-End Basic Data Engineering Tutorial (Spark, Apache Iceberg Dremio, Superset)

Apache Iceberg’s Architecture

Here is a list of resources to help you learn Apache Iceberg’s architecture and internals:

The Life of a Read Query for Apache Iceberg Tables
The Life of a Write Query for Apache Iceberg Tables
Understanding Apache Iceberg’s Metadata.json
Understanding the Apache Iceberg Manifest List (Snapshot)
Understanding the Apache Iceberg Manifest
Understanding Apache Iceberg Delete Files
Puffins and Icebergs: Additional Stats for Apache Iceberg Tables
How Apache Iceberg is Built for Open Optimized Performance
Ensuring High Performance at Any Scale with Apache Iceberg’s Object Store File Layout
Row-Level Changes on the Lakehouse: Copy-On-Write vs. Merge-On-Read in Apache Iceberg
ACID Guarantees and Apache Iceberg: Turning Any Storage into a Data Warehouse
Apache Iceberg Reliability

Getting Data into Apache Iceberg

Here is a list of resources to help you get data into Apache Iceberg:

8 Tools For Ingesting Data Into Apache Iceberg
Event Based Ingestion for Apache Iceberg Tables
Ingesting Data Into Apache Iceberg Tables with Dremio: A Unified Path to Iceberg
How to Create a Lakehouse with Airbyte, S3, Apache Iceberg, and Dremio
How to Convert JSON Files Into an Apache Iceberg Tables with Dremio
How to Convert CSV Files into an Apache Iceberg table with Dremio
Ingesting Data into Apache Iceberg using Fivetran

Apache Iceberg Migration

Here is a list of resources to help you migrate your data to Apache Iceberg:

Migration Guide for Apache Iceberg Lakehouses
Apache XTable: Converting Between Apache Iceberg, Delta Lake, and Apache Hudi
3 Ways to Convert a Delta Lake Table Into an Apache Iceberg Table
How to Migrate a Hive Table to an Iceberg Table
Migrating a Hive Table to an Iceberg Table Hands-on Tutorial

Streaming with Apache Iceberg

Here is a list of resources to help you stream data into Apache Iceberg:

A Guide to Change Data Capture (CDC) with Apache Iceberg
Apache Kafka to Apache Iceberg to Dremio
Streaming and Batch Data Lakehouses with Apache Iceberg, Dremio and Upsolver
Using Flink with Apache Iceberg and Nessie
Streaming Data into Apache Iceberg Tables Using AWS Kinesis and AWS Glue
Adapting Iceberg for high-scale streaming data

Partitioning with Apache Iceberg

Here is a list of resources to help you learn how to partition your data with Apache Iceberg:

Simplifying Your Partition Strategies with Dremio Reflections and Apache Iceberg
Partition Evolution: Future-Proof Partitioning and Fewer Table Rewrites with Apache Iceberg
Fewer Accidental Full Table Scans Brought to You by Apache Iceberg’s Hidden Partitioning

Maintaining and Auditing Apache Iceberg Tables

Here is a list of resources to help you maintain and audit your Apache Iceberg tables:

Guide to Maintaining an Apache Iceberg Lakehouse
Compaction in Apache Iceberg: Fine-Tuning Your Iceberg Table’s Data Files
Leveraging Apache Iceberg Metadata Tables in Dremio for Effective Data Lakehouse Auditing
What is DataOps? Automating Data Management on the Apache Iceberg Lakehouse
How Z-Ordering in Apache Iceberg Helps Improve Performance
Maintaining Iceberg Tables – Compaction, Expiring Snapshots, and More

Apache Iceberg Catalogs

Here is a list of resources to help you learn about Apache Iceberg Catalogs:

The Evolution of Apache Iceberg Catalogs
Introducing the Apache Iceberg Catalog Migration Tool
What Iceberg REST Catalog Is and Isn’t
Why Thinking about Apache Iceberg Catalogs Like Nessie and Apache Polaris (incubating) Matters
Using Nessie’s REST Catalog Support for Working with Apache Iceberg Tables
The Nessie Ecosystem and the Reach of Git for Data for Apache Iceberg
Understanding the Polaris Iceberg Catalog and Its Architecture
Getting Hands-on with Snowflake Managed Polaris
Getting Hands-on with Polaris OSS, Apache Iceberg and Apache Spark

Querying Apache Iceberg Tables

Here is a list of resources to help you query your Apache Iceberg tables:

Query Iceberg Tables on MinIO with Dremio
Run Graph Queries on Apache Iceberg Tables with Dremio & Puppygraph

Hybrid Apache Iceberg Lakehouses

Here is a list of resources about implementing hybrid on-premises and cloud Apache Iceberg lakehouses:

3 Reasons to Create Hybrid Apache Iceberg Data Lakehouses
Hybrid Iceberg Lakehouse Storage Solutions: NetApp
Hybrid Iceberg Lakehouse Storage Solutions: MinIO
Hybrid Iceberg Lakehouse Infrastructure Solutions: VAST Data
Hybrid Lakehouse Storage Solutions: Pure Storage

Apache Iceberg and Other Formats

Here is a list of resources about Apache Iceberg and other formats (Apache Hudi, Apache Paimon, Delta Lake):

Comparing Apache Iceberg to Other Data Lakehouse Solutions
Exploring the Architecture of Apache Iceberg, Delta Lake, and Apache Hudi
Comparison of Data Lake Table Formats (Apache Iceberg, Apache Hudi and Delta Lake)
Table Format Partitioning Comparison: Apache Iceberg, Apache Hudi, and Delta Lake
Table Format Governance and Community Contributions: Apache Iceberg, Apache Hudi, and Delta Lake

Python and Apache Iceberg

Here is a list of resources about Apache Iceberg and Python:

3 Ways to Use Python with Apache Iceberg
PyIceberg Docs

Governing Apache Iceberg Tables

Apache Iceberg and the Right to Be Forgotten

Miscellaneous Apache Iceberg Resources

Here is a list of miscellaneous resources to help you learn Apache Iceberg:

Introduction to the Iceberg Data Lakehouse
The Iceberg Lakehouse: Key Benefits for Your Business
Evolving the Data Lake: From CSV/JSON to Parquet to Apache Iceberg
Data Sharing of Apache Iceberg tables and other data in the Dremio Lakehouse
The Value of Dremio’s Semantic Layer and The Apache Iceberg Lakehouse to the Snowflake User
The Who, What and Why of Data Reflections and Apache Iceberg for Query Acceleration
How Apache Iceberg, Dremio and Lakehouse Architecture can optimize your Cloud Data Platform Costs
Dremio’s Commitment to being the Ideal Platform for Apache Iceberg Data Lakehouses
Open Source and the Data Lakehouse: Apache Arrow, Apache Iceberg, Nessie and Dremio
The Why and How of Using Apache Iceberg on Databricks
Deep Dive Into Configuring Your Apache Iceberg Catalog with Apache Spark
Connecting Tableau to Apache Iceberg Tables with Dremio
Apache Iceberg 101
Apache Iceberg FAQ
Why Data Analysts, Engineers, Architects and Scientists Should Care about Dremio and Apache Iceberg
Data Lake Mysteries Unveiled: Nessie, Dremio, and MinIO Make Waves

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.