# Spatial SQL Cookbook: Common GIS Analyses in Spatial SQL

One of the biggest challenges, at least for me, when getting started with spatial SQL is having a quick and easy reference to transfer your current GIS workflows into SQL. There are many amazing resources out there to expand this knowledge, but this guide is meant to be a really simple cookbook to start to translate your current workflows into spatial SQL.

Other resources

## A few things to note:

• I will write SQL code for PostGIS and common data warehouses
• Not every data warehouse supports projections – for example, BigQuery and Snowflake only support WGS 84 SRID 4326
• This guide was published in February 2022 and updates will be made as needed

# Working with geometries

## 1. Create a point geometry from a latitude/longitude pair

This query will create a geometry from numeric columns that contain latitude and longitude values.

``````# PostGIS

SELECT
ST_SetSRID(ST_MakePoint(lng, lat), 4326) as point_geom
FROM table

# BigQuery

SELECT
ST_GEOGPOINT(lng, lat) as point_geom
FROM table

# Snowflake and Redshift

SELECT
ST_MakePoint(lng, lat) as point_geom
FROM table
``````

## 2. Create a geometry from Well Known Text

Similar to above, this will create a geometry from a WKT string:

``````# PostGIS, Redshift

SELECT
ST_GeomFromText('POINT(-71.064544 42.28787)', 4326) as geom
FROM table

# BigQuery

SELECT
ST_GEOGFROMTEXT('POINT(-71.064544 42.28787)') as geom
FROM table

# Snowflake

SELECT
ST_GEOGFROMWKT('POINT(-71.064544 42.28787)') as geom
FROM table``````

## 3. Change your data to a different projection

You can re-project your data using a simple function to a new projection using this query:

``````# PostGIS and Redshift

SELECT
ST_Transform(geom, 4326) as geom
FROM table``````

## 4. Get a latitude and longitude from a geometry

Using a point geometry you can extract the latitude and longitude as numeric values:

``````# PostGIS, BigQuery, Snowflake, and Redshift

SELECT
ST_X(geom) as longitude,
ST_Y(geom) as latitude
FROM table``````

# Measurements

## 5. Find the area of a polygon

Find the area of a polygon in square meters

``````# PostGIS, BigQuery, Snowflake, and Redshift

SELECT
ST_Area(geom) as area
FROM table``````

## 6. Find the perimeter of a polygon

Similar to above but perimeter of a polygon in meters

``````# PostGIS, BigQuery, Snowflake, and Redshift

SELECT
ST_Perimeter(geom) as area
FROM table``````

## 7. Find the length of a line

Calculate the length of a line in meters

``````# PostGIS, BigQuery, Snowflake, and Redshift

SELECT
ST_Length(geom) as length
FROM table``````

## 8. Find the distance between to geometries

Returns the distance in meters between two geometries. Each of the different functions has a different signature for using the spheroid, or curved earth, calculation so make sure to check the docs in this case.

``````# PostGIS and BigQuery

SELECT
ST_Distance(geom_1, geom_2) as distance
FROM table``````

## 9. Find the shortest distance, to a specific point, between two geometries

Given two geometries, find the closest point on geometry 2 to geometry 1. The function will return a new point that represents the closest point on the second geometry to the first geometry.

``````# PostGIS and BigQuery

SELECT
ST_ClosestPoint(geom_1, geom_2) as closest_point
FROM table``````

# Transform geometries

## 10. Create a buffer around a geometry

Create a buffer around a geometry, which returns a new geometry. The number in the function below is the distance in meters for the buffer.

``````# PostGIS, BigQuery, and Redshift

SELECT
ST_Buffer(geom, 100) as closest_point
FROM table``````

## 11. Centroid of a polygon or line

If you want to find the centroid of a geometry you can use this query to return a point of the centroid of a geometry:

``````# PostGIS, BigQuery, Snowflake, and Redshift

SELECT
ST_Centroid(geom) as centroid
FROM table``````

## 12. Create a concave or convex hull

You can create a concave or convex hull from a geometry or group of geometries, which will return a new geometry:

``````# PostGIS, BigQuery, and Redshift

SELECT
ST_ConvexHull(geom) as convex_hull
FROM table

SELECT
ST_ConcaveHull(geom) as convex_hull
FROM table``````

You can also do this by grouping geometries:

``````# ST_Collect will collect the matching geometries together into a GeometryCollection

SELECT
ST_ConvexHull(ST_Collect(geom)) as convex_hull,
category
FROM table
GROUP BY category``````

## 13. Union geometries together

If you want to union your geometries, use ST_Union to return a new geometry:

``````# PostGIS and BigQuery

SELECT
ST_Union(geom) as geom,
category
FROM table
GROUP BY category``````

## 14. Create Voronoi polygons

Create Voronoi polygons around your geometries, returning new geometries:

``````# PostGIS

SELECT
ST_VoronoiPolygons(geom) as voronoi_polygons
FROM table``````

## 15. Find the resulting polygon of an intersection

Commonly know as a clip in GIS, use this function to return the resulting intersecting area from two geometries:

``````# PostGIS, BigQuery, Redshift, and Snowflake

SELECT
ST_Intersection(geom_1, geom_2) as intersection
FROM table``````

Or if you want to find the leftover parts of a geometry relations ship you can use ST_Difference:

``````# PostGIS, BigQuery, Redshift, and Snowflake

SELECT
ST_Difference(geom_1, geom_2) as intersection
FROM table``````

# Spatial Relationships

## 16. Find geometries within a certain distance

Find geometries within a certain distance of another geometry, commonly used in aggregation queries or WHERE clauses:

``````# PostGIS, BigQuery, Redshift, and Snowflake

SELECT
*
FROM table
WHERE ST_DWithin(
geom_1,
ST_GeomFromText('POINT(-73.9895258 40.7413529)',4326),
1000)``````

## 17. See if two geometries overlap, touch, cross, intersect, contain, etc. (or evaluate spatial relationships)

This is a more complicated one as there are several different functions, each with slight differences between them, to evaluate spatial relationships. Make sure to check the docs of the database or data warehouse you are using as these may vary, or may have different tolerances.

Below are the core functions for spatial relationships for PostGIS, which you can find more detailed information on here. Also, each function will return a boolean (true/false) based on the outcome of the spatial relationship which we can use a few ways (more below).

• ST_Equals – returns true if the two geometries are exactly equal
• ST_Intersects – returns true if the two geometries share any space in common
• ST_Disjoint – returns true is the two geometries do not share any space, or the opposite of ST_intersects
• ST_Crosses – Returns true if the geometries have some, but not all, interior points in common
• In more detail, “returns true if the intersection results in a geometry whose dimension is one less than the maximum dimension of the two source geometries and the intersection set is interior to both source geometries”.
• If the geometries intersect but also the intersection result is one less dimension (or it crosses the intersecting geometry one time) it will return true. Works for multipoint/polygon, multipoint/linestring, linestring/linestring, linestring/polygon, and linestring/multipolygon comparisons.
• ST_Overlaps – returns true if the two geometries overlap spatially, or intersect but one does not completely contain the other
• ST_Touches – returns true if one geometry touches another, but does not intersect the interior
• ST_Within – returns true if the first geometry is completely within the second geometry
• ST_Contains – returns true if the second geometry is completely contained by the first geometry (opposite of ST_Within)
``````# PostGIS, BigQuery, Redshift, and Snowflake

SELECT
ST_Intersects(geom_1, geom_2) as intersects
FROM table``````

# Other functions

## 18 . Simplify a geometry

If you want to simplify a geometry you can use these functions to simplify and return new geometries. Note there are some functions that will not preserve topology (or touching lines/relationships) but these below will. The number below is the tolerance of change in meters.

``````# PostGIS

SELECT
ST_SimplifyPreserveTopology(geom, 1) as geom
FROM table

# PostGIS, BigQuery, Redshift, and Snowflake

SELECT
ST_Simplify(geom, 1) as geom
FROM table``````

## 19. Create random points

Generate N random points within a polygon. You can designate the number or use data from your table (illustrated below). Keep in mind each function will return different things (PostGIS returns a MultiPoint, BigQuery returns an array) so you may need to use other functions if you want to extract each point into a row of it’s own:

``````# PostGIS

SELECT
ST_GeneratePoints(geom, number_column) as geom
FROM table

# BigQuery

WITH point_lists AS (
SELECT `carto-un`.carto.ST_GENERATEPOINTS(geom, number_column) AS points
FROM table
)
SELECT points FROM point_lists CROSS JOIN point_lists.points``````

## 20. Cluster points

Analyze your data by creating spatial clusters of geometries. There are two methods, DBSCAN and KMeans, for doing this. Each function will return a number for the cluster each row belongs to and it also is a window function so it uses the OVER() operation here as well:

``````# PostGIS and BigQuery

SELECT
geom,
ST_ClusterDBSCAN(geom, desired_distance, min_geoms_per_cluster) OVER() as cluster_id
FROM table``````
``````# PostGIS and BigQuery

SELECT
geom,
ST_ClusterKMeans(geom, number_of_clusters) OVER() as cluster_id
FROM table``````

# How to use spatial functions

There are many ways to use spatial functions and depending on your need, you can leverage these functions a few ways. These are just some examples, but fairly common use cases.

## Single Geometry

First, you can query against a single geometry to other rows in your table:

``````SELECT
ST_Distance(
geom,
ST_GeomFromText('POINT(-73.9895258 40.7413529)')) as distance
FROM table``````

## Same Table

You can also compare against values in your same table – this example uses a subquery to illustrate it.

``````SELECT
ST_Distance(
geom,
(SELECT geom FROM table WHERE id = 1)) as distance
FROM table``````

## WHERE Clause

Spatial functions can also be used in a WHERE clause to limit results:

``````SELECT
*
FROM table
WHERE ST_Distance(
geom,
(SELECT geom FROM table WHERE id = 1)) < 100``````

## Joins

Joins are also used to join two tables using a spatial function as a condition, oftentimes for aggreagations and spatial joins:

``````SELECT
a.id,
COUNT(b.id) as count
FROM table a
JOIN other_table b
ON ST_Contains(a.geom, b.geom)
GROUP BY a.id``````

## Cross Joins

Cross joins can be used a number of ways, but essentially evaluate every value of one table against the values of the other table. They can also be used to perform some more complex “bulk loop joins”, which we will detail below.

``````SELECT
ST_Distance(a.geom, b.geom) as distance
a.id as id_one,
b.id as id_two
FROM table a, other_table b``````

## Subqueries

Subqueries, like those shown above, can allow you to retrieve specific values or aggregates from other tables:

``````SELECT
a.id,
(SELECT COUNT(id) FROM table WHERE ST_DWithin(a.geom, geom, 400)) as within_400m
FROM other_table a``````

## Aggregations

Our join example above illustrates how aggregations can be used, but there are many others as well. Check out this post for more details and use cases.

# Other complex or specific functions

## 21. H3 Cells

The CARTO Spatial Extension provides functionality to create and manage H3 cells and other features. This query will create H3 cells within the boundary of a specific geometry (using the BigQuery table syntax:

``````WITH
mn AS (
SELECT
geom
FROM
project.dataset.table
WHERE
name = 'Minnesota')

SELECT
h3,
`carto-un`.carto.H3_BOUNDARY(h3) AS geom
FROM (
SELECT `carto-un`.carto.H3_POLYFILL(geom, 5) AS cells
FROM
mn),
UNNEST(cells) AS h3``````

## 22. Create map tiles

You can also create map tiles in spatial SQL. The video below details how do to this with some functions in PostGIS and pg_tileserv.

You can also do this in various data warehouses using the CARTO Spatial Extension, illustrated here in BigQuery:

``````CALL `carto-un`.carto.CREATE_TILESET(
R'''(
SELECT geom, type
)
''',
STRUCT(
"Tileset name" AS name,
"Tileset description" AS description,
NULL AS legend,
0 AS zoom_min,
10 AS zoom_max,
"geom" AS geom_column_name,
NULL AS zoom_min_column,
NULL AS zoom_max_column,
1024 AS max_tile_size_kb,
"RAND() DESC" AS tile_feature_order,
true AS drop_duplicates,
R'''
"version": "1.0.0",
"layer": "layer1"
}
)
);``````

## 23. Routing with roads or other networks

PgRouting provides routing functionality in PostGIS and you can calculate routes using some simple functions. You can find more info in this tutorial.

``````SELECT * FROM pgr_dijkstra(
'
SELECT gid AS id,
source,
target,
length AS cost
FROM ways
',
1,
2,
directed := false);``````

You can do the sample with the CARTO Analytics Toolbox. After creating a network, you can calculate the shortest paths:

``````CALL `carto-un`.carto.FIND_SHORTEST_PATH_FROM_NETWORK_TABLE(
"mydataset.network_table",
"mydataset.shortest_path_table",
"ST_GEOGPOINT(-74.0, 40.0)",
"ST_GEOGPOINT(-75.0, 41.0)"
);``````

# Sample cookbook recipes

## 24. Simple nearest neighbor

A very common problem, solving the nearest neighbor for one specific value, increasing the LIMIT will add more results:

``````# PostGIS

SELECT id, (SELECT geom FROM another_table LIMIT 1) <-> geom AS distance
FROM table
ORDER BY distance
LIMIT 1``````
``````# BigQuery

SELECT
a.id,
ARRAY_AGG(b.geoid ORDER BY st_distance(a.geom,
b.geom) ASC LIMIT 1)[SAFE_OFFSET(0)] as id,
ARRAY_AGG(st_distance(a.geom,
b.geom) ORDER BY st_distance(a.geom,
b.geom) ASC LIMIT 1)[SAFE_OFFSET(0)] as dist

FROM
project.dataset.table_1 a
CROSS JOIN project.dataset.table_2 b
GROUP BY a.id``````

## 25. Calculate the percentage overlap between polygons

Calculate the percentage overlap of polygons with ST_Intersection and ST_Area. You can do this for one polygon for every row in a table or in a where clause to find overlapping areas that intersect a certain amount.

``````# PostGIS

WITH a AS (SELECT geom FROM table WHERE id = 1)

SELECT
ST_Area(ST_Intersection(geom, (SELECT geom FROM a))/ST_Area(geom) AS overlap
FROM table
ORDER BY overlap``````
``````# PostGIS

SELECT
COUNT(a.column),
b.id,
b.geom
FROM table b
LEFT JOIN table_2 a
ON ST_Intersects(a.geom, b.geom)
WHERE ST_Area(ST_Intersection(a.geom, b.geom)/ST_Area(geom) > .5``````

## 26. Find the distance between the two nearest points on two polygons that do not touch

``````# PostGIS

SELECT
ST_Distance(ST_ClosestPoint(a.geom,
b.geom),
ST_ClosestPoint(b.geom,
a.geom)) AS distance,
ST_MakeLine(ST_ClosestPoint(a.geom,
b.geom),
ST_ClosestPoint(b.geom,
a.geom)) AS geom,
a.objectid
FROM
table a
CROSS JOIN LATERAL (
SELECT
geom
FROM
table_2
WHERE
ST_Disjoint(a.geom,
geom)
AND a.objectid != objectid
ORDER BY
ST_Distance(geom,
a.geom) ASC LIMIT 1) b``````
``````# BigQuery

SELECT
ST_MakeLine(ST_ClosestPoint(a.geom,
b.geom),
ST_ClosestPoint(b.geom,
a.geom)) AS geom,
ST_Distance(ST_ClosestPoint(a.geom,
b.geom),
ST_ClosestPoint(b.geom,
a.geom)) AS geom,
a.id
FROM
table_one a
LEFT JOIN (
SELECT
one.geom,
two.id,
ROW_NUMBER() OVER (PARTITION BY two.id ORDER BY ST_Distance(one.geom, two.geom) ASC) AS rank
FROM
table_one one,
table_two two
WHERE
ST_Disjoint(one.geom,
two.geom)
ORDER BY
ST_Distance(one.geom,
two.geom) ASC ) b
USING
(id)
WHERE
b.rank = 1``````

## 27. Create a buffer around a geometry and clip out areas from the buffer

Take a geometry, create a buffer, then clip out areas from another layer from the buffer.

``````# PostGIS

SELECT
ST_Difference(
ST_Union(
ST_Buffer(a.geom, 1609)
),
ST_Union(b.geom))
as clipped_geom
FROM table a
LEFT JOIN table_2 b
ON ST_Intersects(a.geom, b.geom)

# BigQuery

SELECT
ST_Difference(
ST_Union_Agg(
ST_Buffer(a.geom, 1609)
),
ST_Union(b.geom))
as clipped_geom
FROM table a
LEFT JOIN table_2 b
ON ST_Intersects(a.geom, b.geom)``````

## 28. Bulk nearest neighbor joins

Join two tables together by finding the nearest nieghbor from one table to all rows in another table. Inspired by this post by Paul Ramsey.

``````# PostGIS

SELECT
a.geom,
b.name,
ST_Distance(a.geom, b.geom) AS distance
FROM
table a
CROSS JOIN LATERAL
(SELECT name, geom
FROM table_2 b
ORDER BY
a.geom <--> geom
LIMIT 1) b``````
``````# BigQuery

SELECT
a.id,
ARRAY_AGG(b.geoid ORDER BY st_distance(a.geom,
b.geom) ASC LIMIT 1)[SAFE_OFFSET(0)] as id,
ARRAY_AGG(st_distance(a.geom,
b.geom) ORDER BY st_distance(a.geom,
b.geom) ASC LIMIT 1)[SAFE_OFFSET(0)] as dist

FROM
table_one a
CROSS JOIN table_two b

GROUP BY a.id``````

## 29. Calculate the averages of values in neighboring polygons

For each polygon, find the average (or other aggregate value) of it’s neighboring polygons.

``````# BigQuery

WITH
one AS (
SELECT
a.name,
SUM(b.value) AS total
FROM
table a
LEFT JOIN
table_2 b
ON
ST_Touches(b.geom, a.geom)
GROUP BY
a.name)

SELECT
one.*,
b.geom
FROM
one
JOIN
table b
USING
(name)``````
``````# PostGIS

SELECT
a.geom,
a.name,
SUM(b.value) as total
FROM
table a
CROSS JOIN LATERAL
(SELECT value
FROM table_two
WHERE
ST_Intersects(geom, a.geom)
) AS b
GROUP BY a.name, a.geom``````

## 30. Aggregate points into H3 cells

``````# BigQuery

WITH
data AS (
SELECT
`carto-un`.carto.H3_FROMGEOGPOINT(geog, 4) AS h3id,
COUNT(*) AS agg_total
FROM `cartobq.docs.starbucks_locations_usa`
GROUP BY h3id
)
SELECT
h3id,
agg_total,
`carto-un`.carto.H3_BOUNDARY(h3id) AS geom
FROM
data``````

## 31. Rank nearest N neighbors row by row

For each row in one table, create a new set of N rows with the N nearest neighbors with a ranking. In short, each row in your evaluated table will return N new rows based on how many neighbors you want to find.

``````# PostGIS

SELECT
a.id,
t.geom,
ST_Distance(t.geom, a.geom) as closest_dist
FROM a
CROSS JOIN LATERAL (
SELECT *,
ROW_NUMBER() OVER(ORDER BY a.geom <-> geom) AS rank
FROM seattle_parks
LIMIT 5
) as t``````
``````# BigQuery

SELECT
b.id,
a.station_id,
a.rank,
a.name,
a.dist
FROM
table b
LEFT JOIN (
SELECT
b.station_id,
b.name,
ST_Distance(c.geom, b.geom) AS dist,
b.id,
ROW_NUMBER() OVER (PARTITION BY c.id ORDER BY ST_Distance(c.geom, b.geom) ASC) AS rank
FROM
table b,
table_2 h ) a
USING
(id)
WHERE
rank < 4``````

# Vanilla SQL Reference

There are a few vanilla SQL references here that I want to call out as well that will help you as you start to combine more complex analysis:

## Joins

There are many different types of joins but they generally follow this syntax:

``````SELECT
a.column,
b.another_column
FROM a
JOIN b ON a.column = b.column``````

You can learn more about joins in this post. Specifically related to spatial SQL you will see joins using spatial relationships:

``````# main query...
JOIN b ON ST_Intersects(a.geom, b.geom)``````

The reason this works is that the function returns a boolean (true/false) similar to the return value of the equation below:

``````# main query...
JOIN b ON a.column = b.column``````

So the join will only complete where those values return true

## CREATE TABLE

If you want to create a new table with the results of your query, that is actually quite easy:

``````CREATE TABLE new_table AS
SELECT * FROM a LIMIT 10``````

Just replace that simple query with any other query you want!

## Common Table Expressions and Subqueries

Common Table Expressions or CTEs allow you to create “temporary tables” that exist only in the context of the query you are running. You can use these to filter or aggregate data at the top of your query chain. You can also chain them together:

``````WITH a AS (SELECT geom, column FROM a)
SELECT * FROM a

# Or chain multiple together

WITH a AS (SELECT geom, column FROM a),
b as (SELECT geom, column FROM b)
SELECT
a.geom,
b.column
FROM a
JOIN b USING (column)``````

Subqueries work in a similar fashion but in the context of the query itself:

``````SELECT
*
FROM a
WHERE ST_Intersects(geom, (SELECT geom FROM b LIMIT 1))``````

It is also easy to add a new column to a table and update the results of that table using a function:

``````ALTER TABLE table ADD COLUMN centroid GEOMETRY

# To update the new column

UPDATE table SET centroid = ST_Centroid(geom)``````

## Window Functions

Window functions allow you to define an operation over a specific window of data you want to study. You can define the data within this window, the order of the data, and more. My favorite example to illustrate this is a 7 day rolling average:

``````SELECT
*,
AVG(value) OVER (PARTITION BY id ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) as rolling_average
FROM table``````