Spatial SQL Cookbook: Common GIS Analyses in Spatial SQL
One of the biggest challenges, at least for me, when getting started with spatial SQL is having a quick and easy reference to transfer your current GIS workflows into SQL. There are many amazing resources out there to expand this knowledge, but this guide is meant to be a really simple cookbook to start to translate your current workflows into spatial SQL.
The Spatial SQL Book – Available now!
Check out my new book on Spatial SQL with 500+ pages to help you go from SQL novice to spatial SQL pro.
Other resources
A few things to note:
- I will write SQL code for PostGIS and common data warehouses
- Not every data warehouse supports projections – for example, BigQuery and Snowflake only support WGS 84 SRID 4326
- This guide was published in February 2022 and updates will be made as needed
Working with geometries
1. Create a point geometry from a latitude/longitude pair
This query will create a geometry from numeric columns that contain latitude and longitude values.
# PostGIS
SELECT
ST_SetSRID(ST_MakePoint(lng, lat), 4326) as point_geom
FROM table
# BigQuery
SELECT
ST_GEOGPOINT(lng, lat) as point_geom
FROM table
# Snowflake and Redshift
SELECT
ST_MakePoint(lng, lat) as point_geom
FROM table
2. Create a geometry from Well Known Text
Similar to above, this will create a geometry from a WKT string:
# PostGIS, Redshift
SELECT
ST_GeomFromText('POINT(-71.064544 42.28787)', 4326) as geom
FROM table
# BigQuery
SELECT
ST_GEOGFROMTEXT('POINT(-71.064544 42.28787)') as geom
FROM table
# Snowflake
SELECT
ST_GEOGFROMWKT('POINT(-71.064544 42.28787)') as geom
FROM table
3. Change your data to a different projection
You can re-project your data using a simple function to a new projection using this query:
# PostGIS and Redshift
SELECT
ST_Transform(geom, 4326) as geom
FROM table
4. Get a latitude and longitude from a geometry
Using a point geometry you can extract the latitude and longitude as numeric values:
# PostGIS, BigQuery, Snowflake, and Redshift
SELECT
ST_X(geom) as longitude,
ST_Y(geom) as latitude
FROM table
Measurements
5. Find the area of a polygon
Find the area of a polygon in square meters
# PostGIS, BigQuery, Snowflake, and Redshift
SELECT
ST_Area(geom) as area
FROM table
6. Find the perimeter of a polygon
Similar to above but perimeter of a polygon in meters
# PostGIS, BigQuery, Snowflake, and Redshift
SELECT
ST_Perimeter(geom) as area
FROM table
7. Find the length of a line
Calculate the length of a line in meters
# PostGIS, BigQuery, Snowflake, and Redshift
SELECT
ST_Length(geom) as length
FROM table
8. Find the distance between to geometries
Returns the distance in meters between two geometries. Each of the different functions has a different signature for using the spheroid, or curved earth, calculation so make sure to check the docs in this case.
# PostGIS and BigQuery
SELECT
ST_Distance(geom_1, geom_2) as distance
FROM table
9. Find the shortest distance, to a specific point, between two geometries
Given two geometries, find the closest point on geometry 2 to geometry 1. The function will return a new point that represents the closest point on the second geometry to the first geometry.
# PostGIS and BigQuery
SELECT
ST_ClosestPoint(geom_1, geom_2) as closest_point
FROM table
Transform geometries
10. Create a buffer around a geometry
Create a buffer around a geometry, which returns a new geometry. The number in the function below is the distance in meters for the buffer.
# PostGIS, BigQuery, and Redshift
SELECT
ST_Buffer(geom, 100) as closest_point
FROM table
11. Centroid of a polygon or line
If you want to find the centroid of a geometry you can use this query to return a point of the centroid of a geometry:
# PostGIS, BigQuery, Snowflake, and Redshift
SELECT
ST_Centroid(geom) as centroid
FROM table
12. Create a concave or convex hull
You can create a concave or convex hull from a geometry or group of geometries, which will return a new geometry:
# PostGIS, BigQuery, and Redshift
SELECT
ST_ConvexHull(geom) as convex_hull
FROM table
SELECT
ST_ConcaveHull(geom) as convex_hull
FROM table
You can also do this by grouping geometries:
# ST_Collect will collect the matching geometries together into a GeometryCollection
SELECT
ST_ConvexHull(ST_Collect(geom)) as convex_hull,
category
FROM table
GROUP BY category
13. Union geometries together
If you want to union your geometries, use ST_Union to return a new geometry:
# PostGIS and BigQuery
SELECT
ST_Union(geom) as geom,
category
FROM table
GROUP BY category
14. Create Voronoi polygons
Create Voronoi polygons around your geometries, returning new geometries:
# PostGIS
SELECT
ST_VoronoiPolygons(geom) as voronoi_polygons
FROM table
15. Find the resulting polygon of an intersection
Commonly know as a clip in GIS, use this function to return the resulting intersecting area from two geometries:
# PostGIS, BigQuery, Redshift, and Snowflake
SELECT
ST_Intersection(geom_1, geom_2) as intersection
FROM table
Or if you want to find the leftover parts of a geometry relations ship you can use ST_Difference:
# PostGIS, BigQuery, Redshift, and Snowflake
SELECT
ST_Difference(geom_1, geom_2) as intersection
FROM table
Spatial Relationships
16. Find geometries within a certain distance
Find geometries within a certain distance of another geometry, commonly used in aggregation queries or WHERE clauses:
# PostGIS, BigQuery, Redshift, and Snowflake
SELECT
*
FROM table
WHERE ST_DWithin(
geom_1,
ST_GeomFromText('POINT(-73.9895258 40.7413529)',4326),
1000)
17. See if two geometries overlap, touch, cross, intersect, contain, etc. (or evaluate spatial relationships)
This is a more complicated one as there are several different functions, each with slight differences between them, to evaluate spatial relationships. Make sure to check the docs of the database or data warehouse you are using as these may vary, or may have different tolerances.
Below are the core functions for spatial relationships for PostGIS, which you can find more detailed information on here. Also, each function will return a boolean (true/false) based on the outcome of the spatial relationship which we can use a few ways (more below).
- ST_Equals – returns true if the two geometries are exactly equal
- ST_Intersects – returns true if the two geometries share any space in common
- ST_Disjoint – returns true is the two geometries do not share any space, or the opposite of ST_intersects
- ST_Crosses – Returns true if the geometries have some, but not all, interior points in common
- In more detail, “returns true if the intersection results in a geometry whose dimension is one less than the maximum dimension of the two source geometries and the intersection set is interior to both source geometries”.
- If the geometries intersect but also the intersection result is one less dimension (or it crosses the intersecting geometry one time) it will return true. Works for multipoint/polygon, multipoint/linestring, linestring/linestring, linestring/polygon, and linestring/multipolygon comparisons.
- ST_Overlaps – returns true if the two geometries overlap spatially, or intersect but one does not completely contain the other
- ST_Touches – returns true if one geometry touches another, but does not intersect the interior
- ST_Within – returns true if the first geometry is completely within the second geometry
- ST_Contains – returns true if the second geometry is completely contained by the first geometry (opposite of ST_Within)
# PostGIS, BigQuery, Redshift, and Snowflake
SELECT
ST_Intersects(geom_1, geom_2) as intersects
FROM table
Other functions
18 . Simplify a geometry
If you want to simplify a geometry you can use these functions to simplify and return new geometries. Note there are some functions that will not preserve topology (or touching lines/relationships) but these below will. The number below is the tolerance of change in meters.
# PostGIS
SELECT
ST_SimplifyPreserveTopology(geom, 1) as geom
FROM table
# PostGIS, BigQuery, Redshift, and Snowflake
SELECT
ST_Simplify(geom, 1) as geom
FROM table
19. Create random points
Generate N random points within a polygon. You can designate the number or use data from your table (illustrated below). Keep in mind each function will return different things (PostGIS returns a MultiPoint, BigQuery returns an array) so you may need to use other functions if you want to extract each point into a row of it’s own:
# PostGIS
SELECT
ST_GeneratePoints(geom, number_column) as geom
FROM table
# BigQuery
WITH point_lists AS (
SELECT `carto-un`.carto.ST_GENERATEPOINTS(geom, number_column) AS points
FROM table
)
SELECT points FROM point_lists CROSS JOIN point_lists.points
20. Cluster points
Analyze your data by creating spatial clusters of geometries. There are two methods, DBSCAN and KMeans, for doing this. Each function will return a number for the cluster each row belongs to and it also is a window function so it uses the OVER() operation here as well:
# PostGIS and BigQuery
SELECT
geom,
ST_ClusterDBSCAN(geom, desired_distance, min_geoms_per_cluster) OVER() as cluster_id
FROM table
# PostGIS and BigQuery
SELECT
geom,
ST_ClusterKMeans(geom, number_of_clusters) OVER() as cluster_id
FROM table
How to use spatial functions
There are many ways to use spatial functions and depending on your need, you can leverage these functions a few ways. These are just some examples, but fairly common use cases.
Single Geometry
First, you can query against a single geometry to other rows in your table:
SELECT
ST_Distance(
geom,
ST_GeomFromText('POINT(-73.9895258 40.7413529)')) as distance
FROM table
Same Table
You can also compare against values in your same table – this example uses a subquery to illustrate it.
SELECT
ST_Distance(
geom,
(SELECT geom FROM table WHERE id = 1)) as distance
FROM table
WHERE Clause
Spatial functions can also be used in a WHERE clause to limit results:
SELECT
*
FROM table
WHERE ST_Distance(
geom,
(SELECT geom FROM table WHERE id = 1)) < 100
Joins
Joins are also used to join two tables using a spatial function as a condition, oftentimes for aggreagations and spatial joins:
SELECT
a.id,
COUNT(b.id) as count
FROM table a
JOIN other_table b
ON ST_Contains(a.geom, b.geom)
GROUP BY a.id
Cross Joins
Cross joins can be used a number of ways, but essentially evaluate every value of one table against the values of the other table. They can also be used to perform some more complex “bulk loop joins”, which we will detail below.
SELECT
ST_Distance(a.geom, b.geom) as distance
a.id as id_one,
b.id as id_two
FROM table a, other_table b
Subqueries
Subqueries, like those shown above, can allow you to retrieve specific values or aggregates from other tables:
SELECT
a.id,
(SELECT COUNT(id) FROM table WHERE ST_DWithin(a.geom, geom, 400)) as within_400m
FROM other_table a
Aggregations
Our join example above illustrates how aggregations can be used, but there are many others as well. Check out this post for more details and use cases.
Other complex or specific functions
21. H3 Cells
The CARTO Spatial Extension provides functionality to create and manage H3 cells and other features. This query will create H3 cells within the boundary of a specific geometry (using the BigQuery table syntax:
WITH
mn AS (
SELECT
geom
FROM
project.dataset.table
WHERE
name = 'Minnesota')
SELECT
h3,
`carto-un`.carto.H3_BOUNDARY(h3) AS geom
FROM (
SELECT `carto-un`.carto.H3_POLYFILL(geom, 5) AS cells
FROM
mn),
UNNEST(cells) AS h3
22. Create map tiles
You can also create map tiles in spatial SQL. The video below details how do to this with some functions in PostGIS and pg_tileserv.
You can also do this in various data warehouses using the CARTO Spatial Extension, illustrated here in BigQuery:
CALL `carto-un`.carto.CREATE_TILESET(
R'''(
SELECT geom, type
FROM `carto-do-public-data.natural_earth.geography_glo_roads_410`
)
''',
R'''`cartobq.maps.natural_earth_roads`''',
STRUCT(
"Tileset name" AS name,
"Tileset description" AS description,
NULL AS legend,
0 AS zoom_min,
10 AS zoom_max,
"geom" AS geom_column_name,
NULL AS zoom_min_column,
NULL AS zoom_max_column,
1024 AS max_tile_size_kb,
"RAND() DESC" AS tile_feature_order,
true AS drop_duplicates,
R'''
"custom_metadata": {
"version": "1.0.0",
"layer": "layer1"
}
''' AS extra_metadata
)
);
23. Routing with roads or other networks
PgRouting provides routing functionality in PostGIS and you can calculate routes using some simple functions. You can find more info in this tutorial.
SELECT * FROM pgr_dijkstra(
'
SELECT gid AS id,
source,
target,
length AS cost
FROM ways
',
1,
2,
directed := false);
You can do the sample with the CARTO Analytics Toolbox. After creating a network, you can calculate the shortest paths:
CALL `carto-un`.carto.FIND_SHORTEST_PATH_FROM_NETWORK_TABLE(
"mydataset.network_table",
"mydataset.shortest_path_table",
"ST_GEOGPOINT(-74.0, 40.0)",
"ST_GEOGPOINT(-75.0, 41.0)"
);
Sample cookbook recipes
24. Simple nearest neighbor
A very common problem, solving the nearest neighbor for one specific value, increasing the LIMIT will add more results:
# PostGIS
SELECT id, (SELECT geom FROM another_table LIMIT 1) <-> geom AS distance
FROM table
ORDER BY distance
LIMIT 1
# BigQuery
SELECT
a.id,
ARRAY_AGG(b.geoid ORDER BY st_distance(a.geom,
b.geom) ASC LIMIT 1)[SAFE_OFFSET(0)] as id,
ARRAY_AGG(st_distance(a.geom,
b.geom) ORDER BY st_distance(a.geom,
b.geom) ASC LIMIT 1)[SAFE_OFFSET(0)] as dist
FROM
project.dataset.table_1 a
CROSS JOIN project.dataset.table_2 b
GROUP BY a.id
25. Calculate the percentage overlap between polygons
Calculate the percentage overlap of polygons with ST_Intersection and ST_Area. You can do this for one polygon for every row in a table or in a where clause to find overlapping areas that intersect a certain amount.
# PostGIS
WITH a AS (SELECT geom FROM table WHERE id = 1)
SELECT
ST_Area(ST_Intersection(geom, (SELECT geom FROM a))/ST_Area(geom) AS overlap
FROM table
ORDER BY overlap
# PostGIS
SELECT
COUNT(a.column),
b.id,
b.geom
FROM table b
LEFT JOIN table_2 a
ON ST_Intersects(a.geom, b.geom)
WHERE ST_Area(ST_Intersection(a.geom, b.geom)/ST_Area(geom) > .5
26. Find the distance between the two nearest points on two polygons that do not touch
# PostGIS
SELECT
ST_Distance(ST_ClosestPoint(a.geom,
b.geom),
ST_ClosestPoint(b.geom,
a.geom)) AS distance,
ST_MakeLine(ST_ClosestPoint(a.geom,
b.geom),
ST_ClosestPoint(b.geom,
a.geom)) AS geom,
a.objectid
FROM
table a
CROSS JOIN LATERAL (
SELECT
geom
FROM
table_2
WHERE
ST_Disjoint(a.geom,
geom)
AND a.objectid != objectid
ORDER BY
ST_Distance(geom,
a.geom) ASC LIMIT 1) b
# BigQuery
SELECT
ST_MakeLine(ST_ClosestPoint(a.geom,
b.geom),
ST_ClosestPoint(b.geom,
a.geom)) AS geom,
ST_Distance(ST_ClosestPoint(a.geom,
b.geom),
ST_ClosestPoint(b.geom,
a.geom)) AS geom,
a.id
FROM
table_one a
LEFT JOIN (
SELECT
one.geom,
two.id,
ROW_NUMBER() OVER (PARTITION BY two.id ORDER BY ST_Distance(one.geom, two.geom) ASC) AS rank
FROM
table_one one,
table_two two
WHERE
ST_Disjoint(one.geom,
two.geom)
ORDER BY
ST_Distance(one.geom,
two.geom) ASC ) b
USING
(id)
WHERE
b.rank = 1
27. Create a buffer around a geometry and clip out areas from the buffer
Take a geometry, create a buffer, then clip out areas from another layer from the buffer.
# PostGIS
SELECT
ST_Difference(
ST_Union(
ST_Buffer(a.geom, 1609)
),
ST_Union(b.geom))
as clipped_geom
FROM table a
LEFT JOIN table_2 b
ON ST_Intersects(a.geom, b.geom)
# BigQuery
SELECT
ST_Difference(
ST_Union_Agg(
ST_Buffer(a.geom, 1609)
),
ST_Union(b.geom))
as clipped_geom
FROM table a
LEFT JOIN table_2 b
ON ST_Intersects(a.geom, b.geom)
28. Bulk nearest neighbor joins
Join two tables together by finding the nearest nieghbor from one table to all rows in another table. Inspired by this post by Paul Ramsey.
# PostGIS
SELECT
a.geom,
b.name,
ST_Distance(a.geom, b.geom) AS distance
FROM
table a
CROSS JOIN LATERAL
(SELECT name, geom
FROM table_2 b
ORDER BY
a.geom <--> geom
LIMIT 1) b
# BigQuery
SELECT
a.id,
ARRAY_AGG(b.geoid ORDER BY st_distance(a.geom,
b.geom) ASC LIMIT 1)[SAFE_OFFSET(0)] as id,
ARRAY_AGG(st_distance(a.geom,
b.geom) ORDER BY st_distance(a.geom,
b.geom) ASC LIMIT 1)[SAFE_OFFSET(0)] as dist
FROM
table_one a
CROSS JOIN table_two b
GROUP BY a.id
29. Calculate the averages of values in neighboring polygons
For each polygon, find the average (or other aggregate value) of it’s neighboring polygons.
# BigQuery
WITH
one AS (
SELECT
a.name,
SUM(b.value) AS total
FROM
table a
LEFT JOIN
table_2 b
ON
ST_Touches(b.geom, a.geom)
GROUP BY
a.name)
SELECT
one.*,
b.geom
FROM
one
JOIN
table b
USING
(name)
# PostGIS
SELECT
a.geom,
a.name,
SUM(b.value) as total
FROM
table a
CROSS JOIN LATERAL
(SELECT value
FROM table_two
WHERE
ST_Intersects(geom, a.geom)
) AS b
GROUP BY a.name, a.geom
30. Aggregate points into H3 cells
# BigQuery
WITH
data AS (
SELECT
`carto-un`.carto.H3_FROMGEOGPOINT(geog, 4) AS h3id,
COUNT(*) AS agg_total
FROM `cartobq.docs.starbucks_locations_usa`
GROUP BY h3id
)
SELECT
h3id,
agg_total,
`carto-un`.carto.H3_BOUNDARY(h3id) AS geom
FROM
data
31. Rank nearest N neighbors row by row
For each row in one table, create a new set of N rows with the N nearest neighbors with a ranking. In short, each row in your evaluated table will return N new rows based on how many neighbors you want to find.
# PostGIS
SELECT
a.id,
t.geom,
ST_Distance(t.geom, a.geom) as closest_dist
FROM a
CROSS JOIN LATERAL (
SELECT *,
ROW_NUMBER() OVER(ORDER BY a.geom <-> geom) AS rank
FROM seattle_parks
LIMIT 5
) as t
# BigQuery
SELECT
b.id,
a.station_id,
a.rank,
a.name,
a.dist
FROM
table b
LEFT JOIN (
SELECT
b.station_id,
b.name,
ST_Distance(c.geom, b.geom) AS dist,
b.id,
ROW_NUMBER() OVER (PARTITION BY c.id ORDER BY ST_Distance(c.geom, b.geom) ASC) AS rank
FROM
table b,
table_2 h ) a
USING
(id)
WHERE
rank < 4
Vanilla SQL Reference
There are a few vanilla SQL references here that I want to call out as well that will help you as you start to combine more complex analysis:
Joins
There are many different types of joins but they generally follow this syntax:
SELECT
a.column,
b.another_column
FROM a
JOIN b ON a.column = b.column
You can learn more about joins in this post. Specifically related to spatial SQL you will see joins using spatial relationships:
# main query...
JOIN b ON ST_Intersects(a.geom, b.geom)
The reason this works is that the function returns a boolean (true/false) similar to the return value of the equation below:
# main query...
JOIN b ON a.column = b.column
So the join will only complete where those values return true
CREATE TABLE
If you want to create a new table with the results of your query, that is actually quite easy:
CREATE TABLE new_table AS
SELECT * FROM a LIMIT 10
Just replace that simple query with any other query you want!
Common Table Expressions and Subqueries
Common Table Expressions or CTEs allow you to create “temporary tables” that exist only in the context of the query you are running. You can use these to filter or aggregate data at the top of your query chain. You can also chain them together:
WITH a AS (SELECT geom, column FROM a)
SELECT * FROM a
# Or chain multiple together
WITH a AS (SELECT geom, column FROM a),
b as (SELECT geom, column FROM b)
SELECT
a.geom,
b.column
FROM a
JOIN b USING (column)
Subqueries work in a similar fashion but in the context of the query itself:
SELECT
*
FROM a
WHERE ST_Intersects(geom, (SELECT geom FROM b LIMIT 1))
Add and update columns
It is also easy to add a new column to a table and update the results of that table using a function:
ALTER TABLE table ADD COLUMN centroid GEOMETRY
# To update the new column
UPDATE table SET centroid = ST_Centroid(geom)
Window Functions
Window functions allow you to define an operation over a specific window of data you want to study. You can define the data within this window, the order of the data, and more. My favorite example to illustrate this is a 7 day rolling average:
SELECT
*,
AVG(value) OVER (PARTITION BY id ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) as rolling_average
FROM table