How to Use Source Cooperative for Cloud-Native Geospatial Data: A Complete Tutorial for QGIS, GeoPandas, DuckDB, PostGIS, and Apache Sedona
Cloud-native geospatial workflows are finally becoming accessible to everyday GIS analysts.
Instead of downloading giant zip files or managing your own S3 buckets, you can now stream geospatial data directly into your tools using modern formats like GeoParquet, COG, and PMTiles.
One of the platforms leading this shift is Source Cooperative, a data sharing project supported by Radiant Earth.
This tutorial walks through how to use Source Cooperative with tools you already know, including QGIS, GeoPandas, DuckDB, PostGIS, and Apache Sedona.
If you’re looking for a practical guide to cloud-native GIS, this is a great place to start.
What Is Source Cooperative?
Source Cooperative is a cloud-native geospatial data platform that lets organizations and individuals share data openly using simple web endpoints.
There are no APIs to manage and no infrastructure to deploy. You access data using standard HTTP or S3 paths.
This matters because:
- You don’t have to download entire datasets
- You don’t need to maintain an S3 bucket
- You can read cloud data directly into your GIS tools
- Many datasets are already stored in modern formats like GeoParquet and PMTiles
You browse available datasets directly on the site and filter by keywords or file types. For this tutorial, we use EuroCrops, a high-resolution pan-European cropland dataset.
Key Cloud-Native Formats Used in This Tutorial
GeoParquet
A columnar, compressed, highly efficient geospatial format perfect for large vector datasets.
PMTiles
A single-file tile archive format ideal for web maps without servers.
COG (Cloud-Optimized GeoTIFF)
A raster format that supports partial reads, making cloud-based access efficient.
Step 1: Explore and Download Data
Inside the EuroCrops dataset you’ll see folders for:
- GeoParquet (projected and unprojected)
- PMTiles
- FlatGeobuf
- Tile previews
You can download any file directly with a single click, or you can work with the data hosted on Source Cooperative.
For example, a GeoParquet file might look like:
<https://data.source.coop/cholmes/euroccrops/GeoParquetProjected/fr_2021_ec.parquet>
You can load that directly into tools without saving anything locally.
Step 2: Load Data in Felt (Easiest Option)
Felt supports drag-and-drop uploads of GeoParquet.
You can also add PMTiles using the “Add from URL” feature.
This gives you an instant interactive map without any extra configuration, which is perfect for previewing large datasets.
Step 3: Load Data in QGIS
Windows and Linux
QGIS already ships with GeoParquet support. Just drag the file in.
macOS
You must install the GeoParquet driver manually using Conda.
Alternatively, you can load the data into PostGIS (next step) and connect QGIS to the database.
Step 4: Load Source Cooperative Data into PostGIS
Using GDAL/OGR, you can load GeoParquet directly into a PostGIS database.
Because Source Cooperative uses authenticated S3 endpoints, you export environment variables such as:
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_SESSION_TOKEN
AWS_DEFAULT_REGION
Then use the GDAL virtual file system path:
/vsis3/cholmes/euroccrops/GeoParquetProjected/fr_2021_ec.parquet
Finally, load into PostGIS:
ogr2ogr -f PostgreSQL \\
PG:"host=localhost dbname=postgis user=postgres password=..." \\
/vsis3/cholmes/euroccrops/GeoParquetProjected/fr_2021_ec.parquet \\
-nln eurocrops_fr
Once imported, you can visualize the table immediately in QGIS.
Step 5: Use Source Cooperative Data with GeoPandas
GeoPandas can read GeoParquet from S3 with only a few lines of code:
gdf = gpd.read_parquet(
"s3://cholmes/euroccrops/GeoParquetProjected/fr_2021_ec.parquet",
storage_options={
"key": ACCESS_KEY,
"secret": SECRET_KEY,
"token": SESSION_TOKEN
}
)
You then visualize it with:
gdf.plot(figsize=(10, 10))
GeoPandas also supports partitioned datasets, which is extremely useful for large catalogs like Overture Maps.
Step 6: Use the Data in DuckDB (with Ibis or SQL)
DuckDB now supports:
- Reading GeoParquet
- Reading S3-hosted files
- Spatial operations through DuckDB-Spatial
- Working with millions of features in memory
Using Ibis
con = ibis.duckdb.connect()
con.raw_sql("INSTALL spatial; LOAD spatial;")
table = con.read_parquet(
"s3://cholmes/euroccrops/GeoParquetProjected/*.parquet",
secret="aws_secrets"
)
table.head()
Using pure SQL
SELECT COUNT(*)
FROM read_parquet('s3://cholmes/euroccrops/GeoParquetProjected/*.parquet');
Counting ~22 million polygons takes only a few seconds.
Step 7: Large-Scale Processing with Apache Sedona (via Wherobots)
This is where cloud-native geospatial really shines.
Using Wherobots (Sedona on a managed cloud platform), you can:
- Load all GeoParquet files into a distributed compute environment
- Run spatial SQL at scale
- Join the dataset with Overture Maps rivers
- Compute nearest-neighbor distances across ~23 million cropland polygons
A nearest river join using spherical distance across the entire dataset completes in about 14 minutes, something that would be impossible in desktop GIS or even many data warehouses.
You can then write the result back out as GeoParquet or register it as an Iceberg table.
Why Cloud-Native Geospatial Matters
This tutorial walks through six different tools, but they all share a theme:
Cloud-native data unlocks powerful workflows:
- No massive downloads
- No local storage limits
- Fast distributed processing when you need it
- Consistent access across tools
- Reproducible pipelines
Source Cooperative simplifies data access.
GeoParquet and COG simplify file formats.
DuckDB and Sedona handle the heavy processing when needed.
This is the direction the entire geospatial field is heading.
Practical Use Cases
1. Building analysis-ready pipelines
Load GeoParquet into PostGIS or DuckDB for workflows that need repeatable queries.
2. AI and ML feature engineering
Use distributed Sedona processing to generate features like distance to rivers, elevation summaries, or land use context.
3. Large-scale mapping
Serve PMTiles into Felt, MapLibre, or your own frontend without running a tile server.
4. Enterprise data integration
Cloud-native formats integrate cleanly with data warehouses, Spark jobs, and modern lakehouse architectures.
Conclusion
Cloud-native geospatial is no longer a niche concept.
Tools like Source Cooperative are making it practical and accessible, even for solo GIS analysts and small teams.
If you want to start working with massive datasets, build scalable analytics, or transition toward modern GIS data engineering, this workflow gives you everything you need.
And the best part is you can get started for free.
If you want to go deeper into these workflows, let me know. I’m building more tutorials on GeoParquet, DuckDB, Overture Maps, Sedona, and cloud-native GIS patterns.
