Uncategorized

How to Use Source Cooperative for Cloud-Native Geospatial Data: A Complete Tutorial for QGIS, GeoPandas, DuckDB, PostGIS, and Apache Sedona

July 3, 2026 Matt Forrest Comments Off

Cloud-native geospatial workflows are finally becoming accessible to everyday GIS analysts.

Instead of downloading giant zip files or managing your own S3 buckets, you can now stream geospatial data directly into your tools using modern formats like GeoParquet, COG, and PMTiles.

One of the platforms leading this shift is Source Cooperative, a data sharing project supported by Radiant Earth.

This tutorial walks through how to use Source Cooperative with tools you already know, including QGIS, GeoPandas, DuckDB, PostGIS, and Apache Sedona.

If you’re looking for a practical guide to cloud-native GIS, this is a great place to start.

What Is Source Cooperative?

Source Cooperative is a cloud-native geospatial data platform that lets organizations and individuals share data openly using simple web endpoints.

There are no APIs to manage and no infrastructure to deploy. You access data using standard HTTP or S3 paths.

This matters because:

You don’t have to download entire datasets
You don’t need to maintain an S3 bucket
You can read cloud data directly into your GIS tools
Many datasets are already stored in modern formats like GeoParquet and PMTiles

You browse available datasets directly on the site and filter by keywords or file types. For this tutorial, we use EuroCrops, a high-resolution pan-European cropland dataset.

Key Cloud-Native Formats Used in This Tutorial

GeoParquet

A columnar, compressed, highly efficient geospatial format perfect for large vector datasets.

PMTiles

A single-file tile archive format ideal for web maps without servers.

COG (Cloud-Optimized GeoTIFF)

A raster format that supports partial reads, making cloud-based access efficient.

Step 1: Explore and Download Data

Inside the EuroCrops dataset you’ll see folders for:

GeoParquet (projected and unprojected)
PMTiles
FlatGeobuf
Tile previews

You can download any file directly with a single click, or you can work with the data hosted on Source Cooperative.

For example, a GeoParquet file might look like:

<https://data.source.coop/cholmes/euroccrops/GeoParquetProjected/fr_2021_ec.parquet>

You can load that directly into tools without saving anything locally.

Step 2: Load Data in Felt (Easiest Option)

Felt supports drag-and-drop uploads of GeoParquet.

You can also add PMTiles using the “Add from URL” feature.

This gives you an instant interactive map without any extra configuration, which is perfect for previewing large datasets.

Step 3: Load Data in QGIS

Windows and Linux

QGIS already ships with GeoParquet support. Just drag the file in.

macOS

You must install the GeoParquet driver manually using Conda.

Alternatively, you can load the data into PostGIS (next step) and connect QGIS to the database.

Step 4: Load Source Cooperative Data into PostGIS

Using GDAL/OGR, you can load GeoParquet directly into a PostGIS database.

Because Source Cooperative uses authenticated S3 endpoints, you export environment variables such as:

AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_SESSION_TOKEN
AWS_DEFAULT_REGION

Then use the GDAL virtual file system path:

/vsis3/cholmes/euroccrops/GeoParquetProjected/fr_2021_ec.parquet

Finally, load into PostGIS:

ogr2ogr -f PostgreSQL \\
  PG:"host=localhost dbname=postgis user=postgres password=..." \\
  /vsis3/cholmes/euroccrops/GeoParquetProjected/fr_2021_ec.parquet \\
  -nln eurocrops_fr

Once imported, you can visualize the table immediately in QGIS.

Step 5: Use Source Cooperative Data with GeoPandas

GeoPandas can read GeoParquet from S3 with only a few lines of code:

gdf = gpd.read_parquet(
    "s3://cholmes/euroccrops/GeoParquetProjected/fr_2021_ec.parquet",
    storage_options={
        "key": ACCESS_KEY,
        "secret": SECRET_KEY,
        "token": SESSION_TOKEN
    }
)

You then visualize it with:

gdf.plot(figsize=(10, 10))

GeoPandas also supports partitioned datasets, which is extremely useful for large catalogs like Overture Maps.

Step 6: Use the Data in DuckDB (with Ibis or SQL)

DuckDB now supports:

Reading GeoParquet
Reading S3-hosted files
Spatial operations through DuckDB-Spatial
Working with millions of features in memory

Using Ibis

con = ibis.duckdb.connect()
con.raw_sql("INSTALL spatial; LOAD spatial;")

table = con.read_parquet(
    "s3://cholmes/euroccrops/GeoParquetProjected/*.parquet",
    secret="aws_secrets"
)

table.head()

Using pure SQL

SELECT COUNT(*)
FROM read_parquet('s3://cholmes/euroccrops/GeoParquetProjected/*.parquet');

Counting ~22 million polygons takes only a few seconds.

Step 7: Large-Scale Processing with Apache Sedona (via Wherobots)

This is where cloud-native geospatial really shines.

Using Wherobots (Sedona on a managed cloud platform), you can:

Load all GeoParquet files into a distributed compute environment
Run spatial SQL at scale
Join the dataset with Overture Maps rivers
Compute nearest-neighbor distances across ~23 million cropland polygons

A nearest river join using spherical distance across the entire dataset completes in about 14 minutes, something that would be impossible in desktop GIS or even many data warehouses.

You can then write the result back out as GeoParquet or register it as an Iceberg table.

Why Cloud-Native Geospatial Matters

This tutorial walks through six different tools, but they all share a theme:

Cloud-native data unlocks powerful workflows:

No massive downloads
No local storage limits
Fast distributed processing when you need it
Consistent access across tools
Reproducible pipelines

Source Cooperative simplifies data access.

GeoParquet and COG simplify file formats.

DuckDB and Sedona handle the heavy processing when needed.

This is the direction the entire geospatial field is heading.

Practical Use Cases

1. Building analysis-ready pipelines

Load GeoParquet into PostGIS or DuckDB for workflows that need repeatable queries.

2. AI and ML feature engineering

Use distributed Sedona processing to generate features like distance to rivers, elevation summaries, or land use context.

3. Large-scale mapping

Serve PMTiles into Felt, MapLibre, or your own frontend without running a tile server.

4. Enterprise data integration

Cloud-native formats integrate cleanly with data warehouses, Spark jobs, and modern lakehouse architectures.

Conclusion

Cloud-native geospatial is no longer a niche concept.

Tools like Source Cooperative are making it practical and accessible, even for solo GIS analysts and small teams.

If you want to start working with massive datasets, build scalable analytics, or transition toward modern GIS data engineering, this workflow gives you everything you need.

And the best part is you can get started for free.

If you want to go deeper into these workflows, let me know. I’m building more tutorials on GeoParquet, DuckDB, Overture Maps, Sedona, and cloud-native GIS patterns.

How to Use Source Cooperative for Cloud-Native Geospatial Data: A Complete Tutorial for QGIS, GeoPandas, DuckDB, PostGIS, and Apache Sedona

What Is Source Cooperative?

Key Cloud-Native Formats Used in This Tutorial

GeoParquet

PMTiles

COG (Cloud-Optimized GeoTIFF)

Step 1: Explore and Download Data

Step 2: Load Data in Felt (Easiest Option)

Step 3: Load Data in QGIS

Windows and Linux

macOS

Step 4: Load Source Cooperative Data into PostGIS

Step 5: Use Source Cooperative Data with GeoPandas

Step 6: Use the Data in DuckDB (with Ibis or SQL)

Using Ibis

Using pure SQL

Step 7: Large-Scale Processing with Apache Sedona (via Wherobots)

Why Cloud-Native Geospatial Matters

Cloud-native data unlocks powerful workflows:

Practical Use Cases

1. Building analysis-ready pipelines

2. AI and ML feature engineering

3. Large-scale mapping

4. Enterprise data integration

Conclusion

Matt Forrest

Spatial Lab

Policies

Spatial SQL

Join Us

Get the newsletter

How to Use Source Cooperative for Cloud-Native Geospatial Data: A Complete Tutorial for QGIS, GeoPandas, DuckDB, PostGIS, and Apache Sedona

What Is Source Cooperative?

Key Cloud-Native Formats Used in This Tutorial

GeoParquet

PMTiles

COG (Cloud-Optimized GeoTIFF)

Step 1: Explore and Download Data

Step 2: Load Data in Felt (Easiest Option)

Step 3: Load Data in QGIS

Windows and Linux

macOS

Step 4: Load Source Cooperative Data into PostGIS

Step 5: Use Source Cooperative Data with GeoPandas

Step 6: Use the Data in DuckDB (with Ibis or SQL)

Using Ibis

Using pure SQL

Step 7: Large-Scale Processing with Apache Sedona (via Wherobots)

Why Cloud-Native Geospatial Matters

Cloud-native data unlocks powerful workflows:

Practical Use Cases

1. Building analysis-ready pipelines

2. AI and ML feature engineering

3. Large-scale mapping

4. Enterprise data integration

Conclusion

Matt Forrest

Spatial Lab

Policies

Spatial SQL

Join Us