The Top 11 Open GeoParquet Datasets: Making big geospatial data easy

In the dynamic field of geospatial technology, the evolution of data formats plays a pivotal role in shaping how we interact with and interpret spatial information. The advent of GeoParquet has marked a significant milestone, offering a more efficient and accessible way to handle large spatial datasets. This blog post delves into a comprehensive exploration of notable open data sources available in GeoParquet format, each offering its unique insights and invaluable resources for various applications.

Microsoft Planetary Computer STAC Items & Building Footprints

Imagine a world where you can access a vast expanse of Earth’s satellite imagery and geospatial data with the click of a button. That’s what Microsoft’s Planetary Computer offers. It’s like having a digital atlas of the globe at your fingertips, but with a level of detail and accessibility that traditional atlases can’t match. Their integration of Building Footprints in GeoParquet format is particularly notable. This not only aids in urban planning and development projects but also paves the way for innovative environmental studies.
Microsoft Planetary Computer Quickstart Guide


user interface showing marine map and search

The ‘nz-building-outlines.parquet’ dataset is a testament to the versatility of GeoParquet. Originally in GeoJSON format from the LINZ Data Service, this dataset’s conversion mirrors the transformation of a hand-drawn map into a digital masterpiece, seamlessly integrating into the world of big data analytics. It’s like watching the old world of cartography shake hands with the new age of digital data processing.

LINZ Data Service

Google-Microsoft Open Buildings – combined by VIDA

This dataset is a monumental fusion of Google’s Open Buildings and Microsoft’s Building Footprints, creating a comprehensive collection of over 2.5 billion footprints. The scope of this dataset is akin to a grand library of global architecture, offering an unprecedented overview of building patterns across the world. For researchers, urban planners, and historians, this dataset is nothing short of a treasure trove, offering insights that were once thought impossible to obtain.

Google-Microsoft Open Buildings on VIDA


EuroCrops is a dataset that combines agricultural data from across the European Union, transforming it into a unified, cloud-native format. This dataset serves as a digital cornucopia of agricultural information, offering over 20 million harmonized field boundaries. It’s a vital resource for agricultural research, policy making, and land management strategies in Europe, providing a detailed overview of the continent’s agricultural landscape.

EuroCrops Dataset

Google Open Buildings

Google’s Open Buildings dataset is an architectural canvas, showcasing the geometry of buildings across most of Africa, South Asia, and Southeast Asia. This dataset is not just a collection of footprints; it’s a narrative of the regions it covers, telling stories of urban development, demographic changes, and architectural evolution. The partitioning by admin boundaries in cloud-native formats like PMTiles and GeoParquet enhances its usability, making it a go-to resource for urban developers and demographers.

Google Open Buildings Dataset

Taxi Zones from TLC Trip Record Data

This dataset provides a unique perspective on urban transportation, detailing the Taxi Zones of New York City. It’s like having a street-level view of the city’s bustling taxi network, transformed into a format that’s ripe for analysis in urban planning and transportation studies. The inclusion of GeoParquet and PMTiles versions of the original Shapefile exemplifies the adaptability of GeoParquet in various urban datasets.

TLC Trip Record Data

Overture Open Buildings

Overture’s experimental re-distribution of their buildings dataset in ‘Cloud-Native Geospatial’ formats is a step towards democratizing building data. It’s an initiative that provides detailed building footprints, essentially sketching the urban landscapes in a digital format that’s easily accessible and analyzable. This dataset opens up new avenues for urban studies and planning, offering a fresh perspective on the built environment.

Overture Open Buildings

GLanCE: A Global Land Cover Training Dataset

The GLanCE dataset is a chronicle of the Earth’s changing landscapes, spanning from 1984 to 2020. It’s a digital time capsule that contains nearly 2 million training units for land cover classes globally. For those involved in environmental monitoring, land use research, and climate studies, this dataset offers a medium-resolution window into the past, presenting an opportunity to study the evolution of global ecosystems over three and a half decades.

GLanCE Land Cover Dataset

National Surface Depressions

This dataset delineates depressions from 10-m DEMs across the nation, offering a unique perspective on geological formations and environmental features. It’s akin to having a detailed contour map of the nation’s terrain, providing invaluable data for geological and environmental studies.

National Surface Depressions Dataset

Non-Floodplain Wetlands

Detailing the extent of Geographically Isolated Wetlands in the US, this dataset is a vital tool for conservationists and ecologists. It’s like having a digital magnifying glass over the country’s wetland ecosystems, offering a detailed view of these critical habitats.

Non-Floodplain Wetlands Dataset

National Wetlands Inventory


The NWI offers an exhaustive look at US wetlands, providing a comprehensive resource for conservation and ecological research. This dataset is not just a collection of data; it’s a narrative of the country’s wetland ecosystems, their distribution, and characteristics.

National Wetlands Inventory

Energy Performance Certificate Ratings (Domestic) – England and Wales

This dataset provides a window into the energy efficiency of properties across England and Wales. It’s an essential tool for understanding the environmental impact and energy consumption patterns of domestic properties. The dataset not only offers data on energy efficiency but also includes recommendations for energy-efficient improvements, making it a valuable resource for environmental studies and energy conservation initiatives.

EPC Ratings Dataset

Each of these datasets, available in the GeoParquet format, represents a significant advancement in the field of geospatial data. They provide a more accessible and efficient means of handling and analyzing large spatial datasets, paving the way for a deeper understanding of our world and its many facets. Whether you’re a researcher, urban planner, or simply a data enthusiast, these resources offer a wealth of information and possibilities for exploration. Happy mapping!