Skip to content
  • Posts
  • Join the Spatial Lab
  • NEWCourses
    • Modern GIS Accelerator
    • Career Compass
    • AI Copilot for GIS
    • Training for Teams
    • Modern GIS Accelerator Go from stuck to in-demand with a clear 90-day roadmap.
      POPULAR
    • Career Compass Land your next GIS role with proven tools and AI-powered strategies.
    • AI Copilot for GIS Master AI workflows in GIS and build smarter, faster projects.
    • For Teams Upskill your GIS team with modern, scalable workflows that drive results.
      NEW
  • Podcast

Get the newsletter

Join 71,000+ geospatial experts growing their skills and careers. Get updates on the most cutting edge updates in modern GIS and geospatial every week.

Edit Content
  • LinkedIn
  • YouTube
Article

How to Build a Cloud-Native Spatial Data Lakehouse

July 3, 2025 Matt Forrest Comments Off on How to Build a Cloud-Native Spatial Data Lakehouse

Most spatial workflows today are still running on a foundation of folders, flat files, and fragile scripts. You’ve probably worked with shapefiles stored in six different places, Python notebooks that quietly break when a column changes, and a dozen versions of the same dataset ending in _final_v2_edit.shp. I’ve been there.

But spatial data has changed. It’s bigger, more dynamic, and needs to be accessible across teams and tools. What hasn’t changed fast enough is how we manage it.

That’s where the spatial data lakehouse comes in.

At its core, a lakehouse architecture combines the flexibility of cloud object storage with the performance and structure of a warehouse. It lets you work with data like an engineer and an analyst—at scale, with full versioning, and without being locked into one platform.

Here’s what you need to build one:

  • Cloud-native storage like Amazon S3 or Google Cloud Storage
  • Open formats like GeoParquet for vector and COG for raster
  • Apache Iceberg as your table format to manage schema, partitioning, and time travel
  • Engines like Apache Sedona, Spark, or Wherobots to query your data with SQL
  • Airflow to automate ingestion, processing, and updates

This stack doesn’t care what software you use. It’s open, flexible, and built for interoperability. You can load data with Python, query it with SQL, join it across years or sources, and share it across teams without duplicating files or losing control of your schema.

I recently walked through this in a LinkedIn carousel, including how this setup lets you do things like:

  • Automatically update a building footprint dataset every month
  • Run spatial joins against flood zones without copying files
  • Version your tables so you can roll back mistakes
  • Build pipelines that scale from your laptop to the cloud

If you’re building anything with geospatial data and you’re tired of managing brittle scripts and bloated folders, this is the architecture you’ve been waiting for.

This isn’t the future of GIS it’s already happening. And it’s time more spatial teams got to work this way.

  • Apache Airflow
  • Apache Iceberg
  • Apache Sedona
  • Cloud Optimized GeoTIFF
  • Cloud-Native Geospatial
  • geoparquet
  • Wherobots
Matt Forrest

Post navigation

Previous
Next

Search

Categories

  • Article (27)
  • Essay (1)
  • Podcast (5)
  • Tutorial (6)

Recent posts

  • What Is QGIS? The Open-Source Platform Powering Modern GIS
  • What Is PostGIS? The Open-Source Spatial Database Powering Modern GIS and AI
  • From Leafmap to GeoAI: How Dr. Qiusheng Wu Is Transforming Open Source GIS and Geospatial Education

Tags

aggregations Apache Airflow Apache Iceberg Apache Sedona ArcGIS bigquery Cloud-Native Geospatial Cloud GIS Cloud Optimized GeoTIFF duckdb Esri geoparquet geospatial gis GISP Modern GIS postgis Python snowflake Spatial SQL sql Wherobots zip codes

Related posts

Podcast

Breaking the GIS Silo: Why GeoParquet and Iceberg are the key to Spatial Analytics at Scale

July 3, 2025 Matt Forrest Comments Off on Breaking the GIS Silo: Why GeoParquet and Iceberg are the key to Spatial Analytics at Scale

For decades, GIS has lived in a world of its own. Specialized software. Obscure formats. A profession that, despite being critical to everything from climate modeling to logistics, has remained siloed from the rest of the data world. But that’s finally starting to change. In the latest episode of the Spatial Stack podcast, I sat […]

Article

Scaling GIS Workflows with COGs, Airflow, and Apache Iceberg

April 25, 2025 Matt Forrest Comments Off on Scaling GIS Workflows with COGs, Airflow, and Apache Iceberg

TOP OF THE STACK What we need to do with COGs COGs (Cloud-Optimized GeoTIFFs) are one of the most promising tools we have for making raster data truly cloud-native. They let you stream just the pieces you need, work remotely, and plug into modern geospatial systems without downloading giant files. But after working closely with […]

Article

From Desktop GIS to Cloud: A Beginner’s Roadmap to Modern GIS Tool

March 7, 2025 Matt Forrest Comments Off on From Desktop GIS to Cloud: A Beginner’s Roadmap to Modern GIS Tool

Modern GIS is changing fast. If you’ve been working with QGIS, ArcGIS, or any other desktop GIS tool, you’ve probably hit some limitations—datasets getting too big, processing times slowing down, and collaboration becoming a challenge. The good news? The cloud offers a way forward. But how do you make that transition? How do you go […]

Spatial Lab
  • Join the Spatial Lab community
Policies
  • Privacy Policy
  • Terms & Conditions
Spatial SQL
  • Get the Spatial SQL book today
Join Us

© Matt Forrest 2024. All Rights Reserved.