• Posts
  • Spatial Lab
  • Modern GIS Accelerator
  • The Spatial SQL Book

Get the newsletter

Join 71,000+ geospatial experts growing their skills and careers. Get updates on the most cutting edge updates in modern GIS and geospatial every week.

Edit Content
  • LinkedIn
  • YouTube
Article

The Top 11 Open GeoParquet Datasets: Making big geospatial data easy

January 18, 2024 Matt Forrest Comments Off on The Top 11 Open GeoParquet Datasets: Making big geospatial data easy

In the dynamic field of geospatial technology, the evolution of data formats plays a pivotal role in shaping how we interact with and interpret spatial information. The advent of GeoParquet has marked a significant milestone, offering a more efficient and accessible way to handle large spatial datasets. This blog post delves into a comprehensive exploration of notable open data sources available in GeoParquet format, each offering its unique insights and invaluable resources for various applications.

Microsoft Planetary Computer STAC Items & Building Footprints

Imagine a world where you can access a vast expanse of Earth’s satellite imagery and geospatial data with the click of a button. That’s what Microsoft’s Planetary Computer offers. It’s like having a digital atlas of the globe at your fingertips, but with a level of detail and accessibility that traditional atlases can’t match. Their integration of Building Footprints in GeoParquet format is particularly notable. This not only aids in urban planning and development projects but also paves the way for innovative environmental studies.
Microsoft Planetary Computer Quickstart Guide

NZ-Building-Outlines

user interface showing marine map and search


The ‘nz-building-outlines.parquet’ dataset is a testament to the versatility of GeoParquet. Originally in GeoJSON format from the LINZ Data Service, this dataset’s conversion mirrors the transformation of a hand-drawn map into a digital masterpiece, seamlessly integrating into the world of big data analytics. It’s like watching the old world of cartography shake hands with the new age of digital data processing.


LINZ Data Service

Google-Microsoft Open Buildings – combined by VIDA


This dataset is a monumental fusion of Google’s Open Buildings and Microsoft’s Building Footprints, creating a comprehensive collection of over 2.5 billion footprints. The scope of this dataset is akin to a grand library of global architecture, offering an unprecedented overview of building patterns across the world. For researchers, urban planners, and historians, this dataset is nothing short of a treasure trove, offering insights that were once thought impossible to obtain.


Google-Microsoft Open Buildings on VIDA

EuroCrops


EuroCrops is a dataset that combines agricultural data from across the European Union, transforming it into a unified, cloud-native format. This dataset serves as a digital cornucopia of agricultural information, offering over 20 million harmonized field boundaries. It’s a vital resource for agricultural research, policy making, and land management strategies in Europe, providing a detailed overview of the continent’s agricultural landscape.


EuroCrops Dataset

Google Open Buildings


Google’s Open Buildings dataset is an architectural canvas, showcasing the geometry of buildings across most of Africa, South Asia, and Southeast Asia. This dataset is not just a collection of footprints; it’s a narrative of the regions it covers, telling stories of urban development, demographic changes, and architectural evolution. The partitioning by admin boundaries in cloud-native formats like PMTiles and GeoParquet enhances its usability, making it a go-to resource for urban developers and demographers.


Google Open Buildings Dataset

Taxi Zones from TLC Trip Record Data


This dataset provides a unique perspective on urban transportation, detailing the Taxi Zones of New York City. It’s like having a street-level view of the city’s bustling taxi network, transformed into a format that’s ripe for analysis in urban planning and transportation studies. The inclusion of GeoParquet and PMTiles versions of the original Shapefile exemplifies the adaptability of GeoParquet in various urban datasets.


TLC Trip Record Data

Overture Open Buildings


Overture’s experimental re-distribution of their buildings dataset in ‘Cloud-Native Geospatial’ formats is a step towards democratizing building data. It’s an initiative that provides detailed building footprints, essentially sketching the urban landscapes in a digital format that’s easily accessible and analyzable. This dataset opens up new avenues for urban studies and planning, offering a fresh perspective on the built environment.


Overture Open Buildings

GLanCE: A Global Land Cover Training Dataset


The GLanCE dataset is a chronicle of the Earth’s changing landscapes, spanning from 1984 to 2020. It’s a digital time capsule that contains nearly 2 million training units for land cover classes globally. For those involved in environmental monitoring, land use research, and climate studies, this dataset offers a medium-resolution window into the past, presenting an opportunity to study the evolution of global ecosystems over three and a half decades.


GLanCE Land Cover Dataset

National Surface Depressions

This dataset delineates depressions from 10-m DEMs across the nation, offering a unique perspective on geological formations and environmental features. It’s akin to having a detailed contour map of the nation’s terrain, providing invaluable data for geological and environmental studies.

National Surface Depressions Dataset

Non-Floodplain Wetlands

Detailing the extent of Geographically Isolated Wetlands in the US, this dataset is a vital tool for conservationists and ecologists. It’s like having a digital magnifying glass over the country’s wetland ecosystems, offering a detailed view of these critical habitats.

Non-Floodplain Wetlands Dataset

National Wetlands Inventory

vector-color

The NWI offers an exhaustive look at US wetlands, providing a comprehensive resource for conservation and ecological research. This dataset is not just a collection of data; it’s a narrative of the country’s wetland ecosystems, their distribution, and characteristics.

National Wetlands Inventory

Energy Performance Certificate Ratings (Domestic) – England and Wales

This dataset provides a window into the energy efficiency of properties across England and Wales. It’s an essential tool for understanding the environmental impact and energy consumption patterns of domestic properties. The dataset not only offers data on energy efficiency but also includes recommendations for energy-efficient improvements, making it a valuable resource for environmental studies and energy conservation initiatives.

EPC Ratings Dataset

Each of these datasets, available in the GeoParquet format, represents a significant advancement in the field of geospatial data. They provide a more accessible and efficient means of handling and analyzing large spatial datasets, paving the way for a deeper understanding of our world and its many facets. Whether you’re a researcher, urban planner, or simply a data enthusiast, these resources offer a wealth of information and possibilities for exploration. Happy mapping!

  • geoparquet
  • Modern GIS
  • Spatial SQL
Matt Forrest

Post navigation

Previous
Next

Search

Categories

  • Article (26)
  • Essay (1)
  • Podcast (4)
  • Tutorial (5)

Recent posts

  • Breaking the GIS Silo: Why GeoParquet and Iceberg are the key to Spatial Analytics at Scale
  • How to Build a Cloud-Native Spatial Data Lakehouse
  • How Cloud Storage Powers Cloud-Native Geospatial, AI, and Analytics

Tags

aggregations Apache Airflow Apache Iceberg Apache Sedona bigquery Cloud-Native Geospatial Cloud GIS Cloud Optimized GeoTIFF duckdb geoparquet geospatial gis Modern GIS postgis Python snowflake Spatial SQL sql Wherobots zip codes

Related posts

Podcast

Breaking the GIS Silo: Why GeoParquet and Iceberg are the key to Spatial Analytics at Scale

July 3, 2025 Matt Forrest No comments yet

For decades, GIS has lived in a world of its own. Specialized software. Obscure formats. A profession that, despite being critical to everything from climate modeling to logistics, has remained siloed from the rest of the data world. But that’s finally starting to change. In the latest episode of the Spatial Stack podcast, I sat […]

Article

How to Build a Cloud-Native Spatial Data Lakehouse

July 3, 2025 Matt Forrest No comments yet

Most spatial workflows today are still running on a foundation of folders, flat files, and fragile scripts. You’ve probably worked with shapefiles stored in six different places, Python notebooks that quietly break when a column changes, and a dozen versions of the same dataset ending in _final_v2_edit.shp. I’ve been there. But spatial data has changed. […]

Article

Scaling GIS Workflows with COGs, Airflow, and Apache Iceberg

April 25, 2025 Matt Forrest Comments Off on Scaling GIS Workflows with COGs, Airflow, and Apache Iceberg

TOP OF THE STACK What we need to do with COGs COGs (Cloud-Optimized GeoTIFFs) are one of the most promising tools we have for making raster data truly cloud-native. They let you stream just the pieces you need, work remotely, and plug into modern geospatial systems without downloading giant files. But after working closely with […]

Spatial Lab
  • Join the Spatial Lab community
Policies
  • Privacy Policy
  • Terms & Conditions
Spatial SQL
  • Get the Spatial SQL book today
Join Us

© Matt Forrest 2024. All Rights Reserved.