Airflow + AI + Iceberg V3: The New Stack for Scalable Geospatial Data

TOP OF THE STACK
Recently I have talked about Airflow, and I have talked about Agentic AI, but not the two together. Well with the release of the Airflow AI SDK from Astronomer you can now do both, together.
Check out this video from The Data Guy (aka George Yates) about this functionality. Take a look and let me know what you would build with this type of functionality.
And be on the lookout for a new Spatial Stack episode with Kenten Danas from Astronomer all about Airflow and how it can be used for geospatial.
THIS WEEK IN THE LAB
Last week we just wrapped up the first of a new series of courses called “bricks” learning about how to efficiently create cloud-native geospatial files including COGs and Geoparquet.
That continued this week with looking at Apache Iceberg for spatial data and next week taking a look at automating geospatial processes with Airflow.
The concept around bricks is that you can choose one, choose a track, or learn all of them a-la-carte and get a verified certification. Or you can join our online community the Spatial Lab to get access to each new program, live sessions, and most importantly a group of like minded individuals.
ICEBERG GEOSPATIAL IS LIVE
Apache Iceberg officially released the Version 3 specification which, among other things, geospatial support. And there is a ton of excitement around Iceberg and specifically this release. Don’t believe me?
Apache Iceberg format version 3 represents a major leap forward in table format design, bringing new capabilities to support modern and complex data workloads.
V3 introduces native support for advanced data types such as timestamp with timezone
, variant
for semi-structured data, and geospatial types like geometry
and geography.
Okay so what? This enables richer data modeling across domains like IoT, analytics, and time series geospatial data, all at scale
Other key enhancements include row-level lineage tracking for auditability and improved schema and partition evolution, all while preserving Iceberg’s core principles of atomic commits, schema evolution, and ACID guarantees.
With these features, V3 really does expand what’s possible in a lakehouse architecture while maintaining the flexibility, scalability, and openness that have made Iceberg a foundational component of the modern data stack.
Most all of these platforms are in the process of adopting V3 as it was just released as of June 19th, 2025, but currently Wherobots is the easiest way to get started now with the community edition to create Iceberg tables with full spatial support.
If you want to see how Iceberg works alongside GPS data for large scale map-matching, check out this upcoming LinkedIn live with some of my amazing colleagues.

But you can start to learn a bit more about Iceberg now. We just launched the Iceberg brick in the Spatial Lab so if you are interested you can get early bird access here.
And if you are in the New York City area…
I will be giving a talk about Iceberg for Geospatial at the NYC Apache Iceberg Community Meetup with 170+ people attending and folks from AWS, Microsoft, Databricks, Puppygraph, ClickHouse, GrubHub, RisingWave, Confluent, Dremio, and more.
The event is on Thursday, July 10th from 3:00 to 9:00 pm (my talk is at 4:15pm). Register for the event here!

NEW VIDEO/PODCAST
Speaking of Iceberg and GIS data silos, this week I pushed a new podcast episode live with my long-time friend and colleague Javier de la Torre, founder of CARTO.
We talk about all the things above and why it is imperative to move spatial data out of the locked silos it has traditionally existed in and into the modern data stack.
LEARN ONE THING
If you are brand new to Iceberg, and want to understand the implications for geospatial, then check this video out.