SedonaDB vs DuckDB vs PostGIS: Which spatial SQL engine is fastest?

Last week the team from Apache Sedona released SedonaDB, a new tool that makes spatial processing fast and also treats spatial data like a first class citizen.
Or read a different way it is really fast for doing spatial analytics. It uses Python and SQL to analyze data and can also output data in GeoParquet format in the simplest and easiest way that I have come across so far.
Now this may sound similar to many other tools that already exist such as PostGIS, DuckDB, and GeoPandas. Why do we need another tool like this and what does it accomplish? Those are all fair questions and many of which have been asked since it was released.
In this post I hope to show you that using two different criteria to compare SedonaDB to DuckDB and PostGIS.
The first criteria is speed: how fast is it compared to these engines.
The second is simplicity: how easy is it to use these systems.
Comparing PostGIS vs. DuckDB vs. SedonaDB
There are many differences between these frameworks that can be used for spatial workloads but here are the key takeaways:
- SedonaDB: purpose-built, columnar, Rust-native, great for spatial SQL at analytic scale without needing Spark or a Postgres instance.
- DuckDB: excellent fast OLAP for Parquet/Arrow; spatial is still evolving but easy to embed.
- PostGIS: battle-tested, feature-rich, enterprise DB; heavier to operate and less columnar/analytical.
PostGIS is a full database that requires you to load your data into tables. It’s great when you need transactions, shared access, and very mature spatial features but data loading and optimization requires several extra steps. That said it has the most mature spatial support of any system.
SedonaDB is built from the ground up for spatial work and makes running spatial queries fast and simple without needing a big database. It can read data from the cloud or locally and is focused on making spatial analytics operations as fast as possible on a single node.
DuckDB is lightweight and great for quickly analyzing local files like Parquet or CSV on a single machine with easy set up, with spatial capabilities available through an extension rather than built-in.
Here is a more fully featured comparison chart:
Feature | SedonaDB | DuckDB (with spatial extension) | PostGIS |
---|---|---|---|
Primary Purpose | Spatial-native single-node analytical database built for geospatial SQL and vector workloads. | General-purpose in-process OLAP database; spatial via extension. | Traditional relational database (PostgreSQL) with spatial support for primarily transactional workloads. |
Underlying Engine | Rust + Apache DataFusion + Apache Arrow. Deep integration with GeoArrow & Sedona’s spatial APIs. | C++ OLAP engine, columnar storage with Arrow interoperability. | PostgreSQL (row-oriented) with spatial types/functions from PostGIS C libraries (GEOS, PROJ, GDAL). |
Separation of Storage & Compute | Flexible: can run over local files or remote data lake (e.g., S3) without full DB install; compute and storage loosely coupled. | Mostly embedded: compute engine runs in same process as client; limited external storage separation beyond reading Parquet files. | Tightly coupled: database manages both storage & compute; scaling requires full Postgres instance(s). |
Deployment Model | Lightweight binary or Python package (pip install apache-sedona[db] ); no external service needed; good for dev to cloud portability. | Fully embedded (no server); ships as single binary or library; integrates easily with Python/R/CLI. | Server-based; must run PostgreSQL cluster; heavier ops overhead but robust multi-user support. |
Spatial Support (Native) | First-class: spatial types, joins, CRS validation & errors, KNN, indexing built into engine (no plugin). | Added via DuckDB-spatial; improving but some advanced ops (CRS checks, KNN) less mature. | Mature: extensive geometry & geography functions, topology, raster support (PostGIS Raster), indexing (GiST/SP-GiST). |
Performance Focus | Analytical & vector workloads on small-to-medium data; optimized joins, pruning, KNN. | Analytical OLAP on local/Parquet; very fast scans & aggregations. | General RDBMS; can handle both OLTP and analytics but row store can slow large spatial analytics. |
Raster Roadmap | Planned future support. | Experimental; community extensions exist. | Supported (PostGIS Raster). |
When to use PostGIS vs. DuckDB vs. SedonaDB
To keep it simple here is my rule of thumb:
Choose SedonaDB when you need a lightweight but spatially powerful SQL engine to run safe, fast analytics on local or data lake files without standing up a full database.
In short, when you need a fast engine for spatial analytics on file on your machine or in the cloud with simple set up.
Choose DuckDB when you want a simple, zero-setup, embedded analytics engine for local Parquet/CSV data and occasional spatial work in notebooks.
In short, when you need a fully featured analytical databased with spatial work that isn’t overly heavy in nature that is easy to set up.
Choose PostGIS when you need a durable, multi-user, production-grade spatial database for apps, transactions, and complex geospatial operations.
In short, when you need a production database for transactional (INSERT, UPDATE, DELETE) and are okay with the setup and maintenance of that system.
Comparing each system using four spatial query types
Whenever you see a spatial system comparison, most performance benchmarks tend to focus on the spatial join: either joining two datasets or doing a join plus aggregation (such as a point in polygon). While that is a valid benchmark there are many other spatial comparisons that make sense. SpatialBench was also released as a part of the SedonaDB release which you can see and test here.
This post focuses on four different spatial queries using real data from the US Census and the New York City open data portal:
- Spatial join with an aggregation: Number of building polygons in each neighborhood in New York City
- Distance aggregation join: Number of buildings within 200 meters of every fire hydrant in New York City
- Area weighted interpolation: Total population within 800 meters of every building in New York City, clipping census geometries and counting only population within that area (i.e. 100 people in an area with 50% overlap with the buffer counts 50 people)
- KNN join: 5 nearest fire hydrants to every building in NYC
These are a set of queries that are commonly used for spatial analytics questions but often times don’t get asked because they are computationally complex and can take a long time to run, or not complete at all.
Yet these types of questions can be invaluable in spatial analytics and as features in machine learning workflows. So let’s see how you can run this test yourself.
Loading data in each system
First you can get all the data I used in the examples from this Google Drive folder to download. None of it is cloud native (and it includes one GeoDatabase) so it represents most types of spatial data you will encounter (unless you choose to turn it into GeoParquet).
In fact you can get all the code seen in this post from this GitHub repo:
https://github.com/mbforr/postgis-duckdb-sedonadb-comparison?tab=readme-ov-file
Loading data into SedonaDB is quite easy. If the data is already in GeoParquet format you can read it in directly:
sd.read_parquet(
"s3://overturemaps-us-west-2/release/2025-08-20.0/theme=divisions/type=division_area/",
options={"aws.skip_signature": True, "aws.region": "us-west-2"},
).to_view("division_area")
Otherwise you can read it via GeoPandas and then as we do in this example:
# Read our files via GeoPandas since they are not in GeoParquet format
import geopandas as gpd
buildings = gpd.read_file('data/Building_Footprints.geojson')
# Create a SedonaDB Datafraom from the GeoPandas Dataframe
buildings_db = sd.create_data_frame(buildings)
buildings_db.show(3)
For DuckDB you can read files in using the ST_Read
function as seen here:
con.sql('''describe from st_read('data/Building_Footprints.geojson')''').show()
PostGIS will require some extra work to set up the database using Docker shown below.
Building a PostGIS database in Docker and loading data with GDAL
The fastest way to set up a PostGIS database that I use regularly is using Docker. Thankfully the team at Kartoza maintains a fantastic Docker image that I have used regularly. The repo above contains a docker-compose.yml
file that you can use to create this.
Once installing Docker on your machine and start it, you can navigate to the directory where you pulled the GitHub repo to and run this command:
docker-compose up --build
That will start up your PostGIS database with a database with these settings:
- Database: gis
- User: docker
- Password: docker
- Port 25432
I chose to connect mine in QGIS but you can use pgAdmin as well.

Next there are two more steps. First you need to install GDAL to use ogr2ogr to import your data. You can use other methods as well but I find this the most efficient as I can prune data going in using the built in Spatialite functionality.
Here is a great guide to get started with installing GDAL. Then go to the import_commands.sh
file to run the various import commands when you are in the same directory as the data you downloaded. Here is one example command.
ogr2ogr \\
-f PostgreSQL PG:"host=localhost user=docker password=docker dbname=gis port=25432" \\
Building_Footprints.geojson \\
-nln nyc_building_footprints -lco GEOMETRY_NAME=geom
Creating multiple geometry columns in PostGIS and adding spatial indexes
Once the data is in we want to optimize the tables as much as possible. This can greatly decrease query time which is a process I outlined in my Spatial SQL book. Below are the core steps you need to know.
- Create geometry columns for the 4326 and 3857 projections
- Populate the columns with the transformed geometry that was imported
- Create a GIST index on the column
alter table TABLE_NAME
add column geom_4326 geometry,
add column geom_3857 geometry;
update TABLE_NAME
set geom_4326 = st_transform(st_makevalid(geom), 4326),
geom_3857 = st_transform(st_makevalid(geom), 3857);
create index geom_3857_idx_TABLE_NAME on
TABLE_NAME using gist(geom_3857);
create index geom_4326_idx_TABLE_NAME on
TABLE_NAME using gist(geom_4326);
And finally you will want to make the geometries of your neighborhoods table to be valid geometires:
update nyc_neighborhoods set geom = st_makevalid(geom)
Now that we have the data in we can run our tests. I am running the SedonaDB and DuckDB queries on my 2023 Mac Book Pro, M3 Chip with 18GB memory. The PostGIS database is running in Docker which I have allocated 8 CPUs, 18GB memory limit, and 4GB swap.
Below are the queries in each system and their performance in each.
Query 1: Spatial Join with Aggregation
SedonaDB: 0.24s
DuckDB: 43.1s
PostGIS: 6.4s
-- Spatial Join
-- SedonaDB
select
count(a.bin) as building_count,
b.neighborhood
from neighborhoods_db b
join buildings_db a
on st_intersects(a.geometry, b.geometry)
group by b.neighborhood
-- DuckDB
select
count(a.bin) as building_count,
b.neighborhood
from st_read('data/Building_Footprints.geojson') a
join st_read('data/nyc_hoods.geojson') b
on st_intersects(a.geom, b.geom)
group by b.neighborhood
-- PostGIS
select
count(a.bin) as building_count,
b.neighborhood
from nyc_neighborhoods b
join nyc_building_footprints a
on st_intersects(a.geom, b.geom)
group by b.neighborhood
Query 2: Distance Within Aggregation
SedonaDB: 1.3s
DuckDB: 41.5s
PostGIS: 21.8s
-- Distance Within
-- SedonaDB
select
count(a.bin) as building_count,
b.unitid
from hydrants_db b
join buildings_db a
on st_dwithin(
st_transform(a.geometry, '3857'),
st_transform(b.geometry, '3857'),
200)
group by b.unitid
-- DuckDB
select
count(a.bin) as building_count,
b.unitid
from st_read('/Users/mattforrest/Documents/spatial-sql-data/NYCDEPCitywideHydrants.geojson') b
join st_read('/Users/mattforrest/Documents/spatial-sql-data/Building_Footprints.geojson') a
on st_dwithin(
st_transform(a.geom, 'EPSG:4326', 'EPSG:3857'),
st_transform(b.geom, 'EPSG:4326', 'EPSG:3857'),
200)
group by b.unitid
-- PostGIS
select
count(a.bin) as building_count,
b.unitid
from nyc_fire_hydrants b
join nyc_building_footprints a
on st_dwithin(
a.geom_3857,
b.geom_3857,
200)
group by b.unitid
Query 3: Area Weighted Interpolation
SedonaDB: 6.6s
DuckDB: 97.6s
PostGIS: 28.9s
-- Spatial Join
-- SedonaDB
select
b.bin,
sum(
a.population * (
st_area(
st_intersection(st_buffer(st_transform(b.geometry, '3857'), 800), st_transform(a.geometry, '3857'))
) / st_area(st_transform(a.geometry, '3857'))
)
) as pop
from census_db a
join buildings_db b
on st_intersects(a.geometry, b.geometry)
group by b.bin
-- DuckDB
select
b.bin,
sum(
a.population * (
st_area(
st_intersection(st_buffer(st_transform(b.geom, 'EPSG:4326', 'EPSG:3857'), 800), st_transform(a.geom, 'EPSG:4326', 'EPSG:3857'))
) / st_area(st_transform(a.geom, 'EPSG:4326', 'EPSG:3857'))
)
) as pop
from st_read('/Users/mattforrest/Documents/spatial-sql-data/nys_census_blocks.geojson') a
join st_read('/Users/mattforrest/Documents/spatial-sql-data/Building_Footprints.geojson') b
on st_intersects(a.geom, b.geom)
group by b.bin
-- PostGIS
select
b.bin,
sum(
a.population * (
st_area(
st_intersection(st_buffer(st_transform(b.geom, 3857), 800), st_transform(a.geom, 3857))
) / st_area(st_transform(a.geom, 3857))
)
) as pop
from nys_2021_census_block_groups a
join nyc_building_footprints b
on st_intersects(a.geom, b.geom)
group by b.bin
Query 4
SedonaDB: 1.8s
DuckDB: Memory error
PostGIS: 83.3s
-- Spatial Join
-- SedonaDB
select
a.bin as building_id,
b.unitid as hydrant_id,
st_distance(a.geometry, b.geometry)
from hydrants_db b
join buildings_db a
on ST_KNN(
a.geometry,
b.geometry,
5,
true)
-- DuckDB
select
a.bin as building_id,
b.unitid as hydrant_id,
b.distance
from st_read('/Users/mattforrest/Documents/spatial-sql-data/Building_Footprints.geojson') a
cross join lateral (
select
unitid,
st_distance(a.geom, geom) as distance
from
st_read('/Users/mattforrest/Documents/spatial-sql-data/NYCDEPCitywideHydrants.geojson')
order by
distance
limit 5
) b
-- PostGIS
select
a.bin as building_id,
b.unitid as hydrant_id,
b.distance
from nyc_building_footprints a
cross join lateral (
select
unitid,
st_distance(a.geom_3857, geom_3857) as distance,
row_number() over() as ranking
from
nyc_fire_hydrants
order by
a.geom <-> geom
limit 5
) b
Final comparisons for DuckDB vs. SedonaDB vs. PostGIS
Here is a chart with the final results from the various queries.

What the chart doesn’t tell is the experience using these systems. PostGIS is and has been the leader in spatial databases. It has the most functions out of any spatial database and has extensions for so many different types of operations, including raster functionality.
That said for quick analysis you have to account for set up time, loading data to tables, indexing, and other clean up tasks before you can even write a query. On top of that this tutorial gives PostGIS the best possible chance to perform, adding in stored geometry columns that are already transformed, spatial indexes, and no on-the-fly geometry validations.
PostGIS is still the best choice for transactional workloads, but at a certain point the level of effort to load data along with the speed for larger processes outweighs the effort for analytical workloads that could be run in a different system and loaded into PostGIS after the fact.
I first found DuckDB when I was working primarily in cloud data warehouses. The effort to get data into these systems was much like PostGIS, but increased since the volumes were larger. The steps were often many steps more:
- Convert spatial file to CSV with WKT in correct projection
- Load CSVs to cloud storage
- Load CSVs into tables in the data warehouse
- Turn WKT into geometries
- Validate geometries
DuckDB with the spatial extension allowed me to test and validate the queries I would eventually run locally before taking all these steps, ensuring it would work and sometimes being able to run the query itself. It made all of this much easier and I could move to the cloud only when required.
But it struggled with the larger more complex spatial queries and might only work if I moved the data into a proper DuckDB table, which defeated the purpose of it being a standalone query engine. It was still better than the alternative but the limitations were starting to show, not to mention that CRS support was added later on.
SedonaDB solves those issues for spatial analytics. It makes spatial data front and center and handles complex and large queries with ease. Even when skipping steps that other systems need, like CRS transformations, it still runs fast. This means it can be the gap not only for fast spatial processing, but greatly simplify the process and I am excited to see where it goes from here to add support for raster data and more spatial functions.