Spatial SQL Cookbook: Common GIS Analyses in Spatial SQL

One of the biggest challenges, at least for me, when getting started with spatial SQL is having a quick and easy reference to transfer your current GIS workflows into SQL. There are many amazing resources out there to expand this knowledge, but this guide is meant to be a really simple cookbook to start to translate your current workflows into spatial SQL.

The Spatial SQL Book – Available now!

Check out my new book on Spatial SQL with 500+ pages to help you go from SQL novice to spatial SQL pro.

Other resources

A few things to note:

  • I will write SQL code for PostGIS and common data warehouses
  • Not every data warehouse supports projections – for example, BigQuery and Snowflake only support WGS 84 SRID 4326
  • This guide was published in February 2022 and updates will be made as needed

Working with geometries

1. Create a point geometry from a latitude/longitude pair

This query will create a geometry from numeric columns that contain latitude and longitude values.

# PostGIS

SELECT
ST_SetSRID(ST_MakePoint(lng, lat), 4326) as point_geom
FROM table

# BigQuery

SELECT
ST_GEOGPOINT(lng, lat) as point_geom
FROM table

# Snowflake and Redshift

SELECT
ST_MakePoint(lng, lat) as point_geom
FROM table

2. Create a geometry from Well Known Text

Similar to above, this will create a geometry from a WKT string:

# PostGIS, Redshift

SELECT
ST_GeomFromText('POINT(-71.064544 42.28787)', 4326) as geom
FROM table

# BigQuery

SELECT
ST_GEOGFROMTEXT('POINT(-71.064544 42.28787)') as geom
FROM table

# Snowflake

SELECT
ST_GEOGFROMWKT('POINT(-71.064544 42.28787)') as geom
FROM table

3. Change your data to a different projection

You can re-project your data using a simple function to a new projection using this query:

# PostGIS and Redshift

SELECT
ST_Transform(geom, 4326) as geom
FROM table

4. Get a latitude and longitude from a geometry

Using a point geometry you can extract the latitude and longitude as numeric values:

# PostGIS, BigQuery, Snowflake, and Redshift

SELECT
ST_X(geom) as longitude,
ST_Y(geom) as latitude
FROM table

Measurements

5. Find the area of a polygon

Find the area of a polygon in square meters

# PostGIS, BigQuery, Snowflake, and Redshift

SELECT
ST_Area(geom) as area
FROM table

6. Find the perimeter of a polygon

Similar to above but perimeter of a polygon in meters

# PostGIS, BigQuery, Snowflake, and Redshift

SELECT
ST_Perimeter(geom) as area
FROM table

7. Find the length of a line

Calculate the length of a line in meters

# PostGIS, BigQuery, Snowflake, and Redshift

SELECT
ST_Length(geom) as length
FROM table

8. Find the distance between to geometries

Returns the distance in meters between two geometries. Each of the different functions has a different signature for using the spheroid, or curved earth, calculation so make sure to check the docs in this case.

# PostGIS and BigQuery

SELECT
ST_Distance(geom_1, geom_2) as distance
FROM table

9. Find the shortest distance, to a specific point, between two geometries

Given two geometries, find the closest point on geometry 2 to geometry 1. The function will return a new point that represents the closest point on the second geometry to the first geometry.

# PostGIS and BigQuery

SELECT
ST_ClosestPoint(geom_1, geom_2) as closest_point
FROM table

Transform geometries

10. Create a buffer around a geometry

Create a buffer around a geometry, which returns a new geometry. The number in the function below is the distance in meters for the buffer.

# PostGIS, BigQuery, and Redshift

SELECT
ST_Buffer(geom, 100) as closest_point
FROM table

11. Centroid of a polygon or line

If you want to find the centroid of a geometry you can use this query to return a point of the centroid of a geometry:

# PostGIS, BigQuery, Snowflake, and Redshift

SELECT
ST_Centroid(geom) as centroid
FROM table

12. Create a concave or convex hull

You can create a concave or convex hull from a geometry or group of geometries, which will return a new geometry:

# PostGIS, BigQuery, and Redshift

SELECT
ST_ConvexHull(geom) as convex_hull
FROM table

SELECT
ST_ConcaveHull(geom) as convex_hull
FROM table

You can also do this by grouping geometries:

# ST_Collect will collect the matching geometries together into a GeometryCollection

SELECT 
ST_ConvexHull(ST_Collect(geom)) as convex_hull,
category
FROM table
GROUP BY category

13. Union geometries together

If you want to union your geometries, use ST_Union to return a new geometry:

# PostGIS and BigQuery

SELECT
ST_Union(geom) as geom,
category
FROM table
GROUP BY category

14. Create Voronoi polygons

Create Voronoi polygons around your geometries, returning new geometries:

# PostGIS

SELECT
ST_VoronoiPolygons(geom) as voronoi_polygons
FROM table

15. Find the resulting polygon of an intersection

Commonly know as a clip in GIS, use this function to return the resulting intersecting area from two geometries:

# PostGIS, BigQuery, Redshift, and Snowflake

SELECT
ST_Intersection(geom_1, geom_2) as intersection
FROM table

Or if you want to find the leftover parts of a geometry relations ship you can use ST_Difference:

# PostGIS, BigQuery, Redshift, and Snowflake

SELECT
ST_Difference(geom_1, geom_2) as intersection
FROM table

Spatial Relationships

16. Find geometries within a certain distance

Find geometries within a certain distance of another geometry, commonly used in aggregation queries or WHERE clauses:

# PostGIS, BigQuery, Redshift, and Snowflake

SELECT
*
FROM table
WHERE ST_DWithin(
	geom_1, 
	ST_GeomFromText('POINT(-73.9895258 40.7413529)',4326),
	1000)

17. See if two geometries overlap, touch, cross, intersect, contain, etc. (or evaluate spatial relationships)

This is a more complicated one as there are several different functions, each with slight differences between them, to evaluate spatial relationships. Make sure to check the docs of the database or data warehouse you are using as these may vary, or may have different tolerances.

Below are the core functions for spatial relationships for PostGIS, which you can find more detailed information on here. Also, each function will return a boolean (true/false) based on the outcome of the spatial relationship which we can use a few ways (more below).

  • ST_Equals – returns true if the two geometries are exactly equal
  • ST_Intersects – returns true if the two geometries share any space in common
  • ST_Disjoint – returns true is the two geometries do not share any space, or the opposite of ST_intersects
  • ST_Crosses – Returns true if the geometries have some, but not all, interior points in common
    • In more detail, “returns true if the intersection results in a geometry whose dimension is one less than the maximum dimension of the two source geometries and the intersection set is interior to both source geometries”.
    • If the geometries intersect but also the intersection result is one less dimension (or it crosses the intersecting geometry one time) it will return true. Works for multipoint/polygon, multipoint/linestring, linestring/linestring, linestring/polygon, and linestring/multipolygon comparisons.
  • ST_Overlaps – returns true if the two geometries overlap spatially, or intersect but one does not completely contain the other
  • ST_Touches – returns true if one geometry touches another, but does not intersect the interior
  • ST_Within – returns true if the first geometry is completely within the second geometry
  • ST_Contains – returns true if the second geometry is completely contained by the first geometry (opposite of ST_Within)
# PostGIS, BigQuery, Redshift, and Snowflake

SELECT
ST_Intersects(geom_1, geom_2) as intersects
FROM table

Other functions

18 . Simplify a geometry

If you want to simplify a geometry you can use these functions to simplify and return new geometries. Note there are some functions that will not preserve topology (or touching lines/relationships) but these below will. The number below is the tolerance of change in meters.

# PostGIS

SELECT
ST_SimplifyPreserveTopology(geom, 1) as geom
FROM table

# PostGIS, BigQuery, Redshift, and Snowflake

SELECT
ST_Simplify(geom, 1) as geom
FROM table

19. Create random points

Generate N random points within a polygon. You can designate the number or use data from your table (illustrated below). Keep in mind each function will return different things (PostGIS returns a MultiPoint, BigQuery returns an array) so you may need to use other functions if you want to extract each point into a row of it’s own:

# PostGIS

SELECT
ST_GeneratePoints(geom, number_column) as geom
FROM table

# BigQuery

WITH point_lists AS (
  SELECT `carto-un`.carto.ST_GENERATEPOINTS(geom, number_column) AS points
  FROM table
)
SELECT points FROM point_lists CROSS JOIN point_lists.points

20. Cluster points

Analyze your data by creating spatial clusters of geometries. There are two methods, DBSCAN and KMeans, for doing this. Each function will return a number for the cluster each row belongs to and it also is a window function so it uses the OVER() operation here as well:

# PostGIS and BigQuery

SELECT
geom,
ST_ClusterDBSCAN(geom, desired_distance, min_geoms_per_cluster) OVER() as cluster_id
FROM table
# PostGIS and BigQuery

SELECT
geom,
ST_ClusterKMeans(geom, number_of_clusters) OVER() as cluster_id
FROM table

How to use spatial functions

There are many ways to use spatial functions and depending on your need, you can leverage these functions a few ways. These are just some examples, but fairly common use cases.

Single Geometry

First, you can query against a single geometry to other rows in your table:

SELECT
ST_Distance(
	geom,
	ST_GeomFromText('POINT(-73.9895258 40.7413529)')) as distance
FROM table

Same Table

You can also compare against values in your same table – this example uses a subquery to illustrate it.

SELECT
ST_Distance(
	geom,
	(SELECT geom FROM table WHERE id = 1)) as distance
FROM table

WHERE Clause

Spatial functions can also be used in a WHERE clause to limit results:

SELECT
*
FROM table
WHERE ST_Distance(
	geom,
	(SELECT geom FROM table WHERE id = 1)) < 100

Joins

Joins are also used to join two tables using a spatial function as a condition, oftentimes for aggreagations and spatial joins:

SELECT
a.id,
COUNT(b.id) as count
FROM table a
JOIN other_table b
ON ST_Contains(a.geom, b.geom)
GROUP BY a.id

Cross Joins

Cross joins can be used a number of ways, but essentially evaluate every value of one table against the values of the other table. They can also be used to perform some more complex “bulk loop joins”, which we will detail below.

SELECT
ST_Distance(a.geom, b.geom) as distance
a.id as id_one,
b.id as id_two
FROM table a, other_table b

Subqueries

Subqueries, like those shown above, can allow you to retrieve specific values or aggregates from other tables:

SELECT
a.id,
(SELECT COUNT(id) FROM table WHERE ST_DWithin(a.geom, geom, 400)) as within_400m
FROM other_table a

Aggregations

Our join example above illustrates how aggregations can be used, but there are many others as well. Check out this post for more details and use cases.

Other complex or specific functions

21. H3 Cells

The CARTO Spatial Extension provides functionality to create and manage H3 cells and other features. This query will create H3 cells within the boundary of a specific geometry (using the BigQuery table syntax:

WITH
  mn AS (
  SELECT
    geom
  FROM
     project.dataset.table
  WHERE
    name = 'Minnesota')

SELECT
  h3,
  `carto-un`.carto.H3_BOUNDARY(h3) AS geom
FROM (
	SELECT `carto-un`.carto.H3_POLYFILL(geom, 5) AS cells
  FROM 
    mn),
UNNEST(cells) AS h3

22. Create map tiles

You can also create map tiles in spatial SQL. The video below details how do to this with some functions in PostGIS and pg_tileserv.

You can also do this in various data warehouses using the CARTO Spatial Extension, illustrated here in BigQuery:

CALL `carto-un`.carto.CREATE_TILESET(
  R'''(
    SELECT geom, type
    FROM `carto-do-public-data.natural_earth.geography_glo_roads_410`
  )
  ''',
  R'''`cartobq.maps.natural_earth_roads`''',
  STRUCT(
    "Tileset name" AS name,
    "Tileset description" AS description,
    NULL AS legend,
    0 AS zoom_min,
    10 AS zoom_max,
    "geom" AS geom_column_name,
    NULL AS zoom_min_column,
    NULL AS zoom_max_column,
    1024 AS max_tile_size_kb,
    "RAND() DESC" AS tile_feature_order,
    true AS drop_duplicates,
    R'''
      "custom_metadata": {
        "version": "1.0.0",
        "layer": "layer1"
      }
    ''' AS extra_metadata
  )
);

23. Routing with roads or other networks

PgRouting provides routing functionality in PostGIS and you can calculate routes using some simple functions. You can find more info in this tutorial.

SELECT * FROM pgr_dijkstra(
    '
      SELECT gid AS id,
        source,
        target,
        length AS cost
      FROM ways
    ',
    1,
    2,
    directed := false);

You can do the sample with the CARTO Analytics Toolbox. After creating a network, you can calculate the shortest paths:

CALL `carto-un`.carto.FIND_SHORTEST_PATH_FROM_NETWORK_TABLE(
  "mydataset.network_table",
  "mydataset.shortest_path_table",
  "ST_GEOGPOINT(-74.0, 40.0)",
  "ST_GEOGPOINT(-75.0, 41.0)"
);

Sample cookbook recipes

24. Simple nearest neighbor

A very common problem, solving the nearest neighbor for one specific value, increasing the LIMIT will add more results:

# PostGIS

SELECT id, (SELECT geom FROM another_table LIMIT 1) <-> geom AS distance
FROM table
ORDER BY distance
LIMIT 1
# BigQuery

SELECT
  a.id,
  ARRAY_AGG(b.geoid ORDER BY st_distance(a.geom,
      b.geom) ASC LIMIT 1)[SAFE_OFFSET(0)] as id,
  ARRAY_AGG(st_distance(a.geom,
      b.geom) ORDER BY st_distance(a.geom,
      b.geom) ASC LIMIT 1)[SAFE_OFFSET(0)] as dist

FROM
  project.dataset.table_1 a
CROSS JOIN project.dataset.table_2 b
GROUP BY a.id

25. Calculate the percentage overlap between polygons

Calculate the percentage overlap of polygons with ST_Intersection and ST_Area. You can do this for one polygon for every row in a table or in a where clause to find overlapping areas that intersect a certain amount.

# PostGIS

WITH a AS (SELECT geom FROM table WHERE id = 1)

SELECT 
ST_Area(ST_Intersection(geom, (SELECT geom FROM a))/ST_Area(geom) AS overlap
FROM table
ORDER BY overlap
# PostGIS

SELECT
COUNT(a.column),
b.id,
b.geom
FROM table b
LEFT JOIN table_2 a 
ON ST_Intersects(a.geom, b.geom)
WHERE ST_Area(ST_Intersection(a.geom, b.geom)/ST_Area(geom) > .5

26. Find the distance between the two nearest points on two polygons that do not touch

# PostGIS

SELECT
  ST_Distance(ST_ClosestPoint(a.geom,
      b.geom),
    ST_ClosestPoint(b.geom,
      a.geom)) AS distance,
	ST_MakeLine(ST_ClosestPoint(a.geom,
      b.geom),
    ST_ClosestPoint(b.geom,
      a.geom)) AS geom,
  a.objectid
FROM
  table a
CROSS JOIN LATERAL (
  SELECT
    geom
  FROM
    table_2
  WHERE
    ST_Disjoint(a.geom,
      geom)
	AND a.objectid != objectid
  ORDER BY
    ST_Distance(geom,
      a.geom) ASC LIMIT 1) b
# BigQuery

SELECT
  ST_MakeLine(ST_ClosestPoint(a.geom,
      b.geom),
    ST_ClosestPoint(b.geom,
      a.geom)) AS geom,
  ST_Distance(ST_ClosestPoint(a.geom,
      b.geom),
    ST_ClosestPoint(b.geom,
      a.geom)) AS geom,
  a.id
FROM
  table_one a
LEFT JOIN (
  SELECT
    one.geom,
    two.id,
    ROW_NUMBER() OVER (PARTITION BY two.id ORDER BY ST_Distance(one.geom, two.geom) ASC) AS rank
  FROM
    table_one one,
    table_two two
  WHERE
    ST_Disjoint(one.geom,
      two.geom)
  ORDER BY
    ST_Distance(one.geom,
      two.geom) ASC ) b
USING
  (id)
WHERE
  b.rank = 1

27. Create a buffer around a geometry and clip out areas from the buffer

Take a geometry, create a buffer, then clip out areas from another layer from the buffer.

# PostGIS

SELECT 
ST_Difference(
   ST_Union(
       ST_Buffer(a.geom, 1609)
     ), 
  ST_Union(b.geom))
  as clipped_geom
FROM table a
LEFT JOIN table_2 b
ON ST_Intersects(a.geom, b.geom)

# BigQuery

SELECT 
ST_Difference(
   ST_Union_Agg(
       ST_Buffer(a.geom, 1609)
     ), 
  ST_Union(b.geom))
  as clipped_geom
FROM table a
LEFT JOIN table_2 b
ON ST_Intersects(a.geom, b.geom)

28. Bulk nearest neighbor joins

Join two tables together by finding the nearest nieghbor from one table to all rows in another table. Inspired by this post by Paul Ramsey.

# PostGIS

SELECT
  a.geom,
  b.name,
  ST_Distance(a.geom, b.geom) AS distance
FROM
  table a
CROSS JOIN LATERAL
  (SELECT name, geom
   FROM table_2 b
   ORDER BY
     a.geom <--> geom
   LIMIT 1) b
# BigQuery

SELECT
  a.id,
  ARRAY_AGG(b.geoid ORDER BY st_distance(a.geom,
      b.geom) ASC LIMIT 1)[SAFE_OFFSET(0)] as id,
  ARRAY_AGG(st_distance(a.geom,
      b.geom) ORDER BY st_distance(a.geom,
      b.geom) ASC LIMIT 1)[SAFE_OFFSET(0)] as dist

FROM
  table_one a
CROSS JOIN table_two b

GROUP BY a.id

29. Calculate the averages of values in neighboring polygons

For each polygon, find the average (or other aggregate value) of it’s neighboring polygons.

# BigQuery

WITH
  one AS (
  SELECT
    a.name,
    SUM(b.value) AS total
  FROM
    table a
  LEFT JOIN
    table_2 b
  ON
    ST_Touches(b.geom, a.geom)
  GROUP BY
    a.name)

SELECT
  one.*,
  b.geom
FROM
  one
JOIN
  table b
USING
  (name)
# PostGIS

SELECT
  a.geom,
  a.name,
  SUM(b.value) as total
FROM
  table a
CROSS JOIN LATERAL
  (SELECT value
   FROM table_two
   WHERE 
     ST_Intersects(geom, a.geom)
  ) AS b
GROUP BY a.name, a.geom

30. Aggregate points into H3 cells

# BigQuery

WITH
  data AS (
  SELECT
    `carto-un`.carto.H3_FROMGEOGPOINT(geog, 4) AS h3id,
    COUNT(*) AS agg_total
  FROM `cartobq.docs.starbucks_locations_usa`
  GROUP BY h3id
  )
SELECT
  h3id, 
  agg_total,
  `carto-un`.carto.H3_BOUNDARY(h3id) AS geom
FROM
  data

31. Rank nearest N neighbors row by row

For each row in one table, create a new set of N rows with the N nearest neighbors with a ranking. In short, each row in your evaluated table will return N new rows based on how many neighbors you want to find.

# PostGIS

SELECT
	a.id, 
  t.geom,
  ST_Distance(t.geom, a.geom) as closest_dist
FROM a
CROSS JOIN LATERAL (
	SELECT *,
	ROW_NUMBER() OVER(ORDER BY a.geom <-> geom) AS rank
  FROM seattle_parks
  LIMIT 5
	) as t
# BigQuery

SELECT
  b.id,
  a.station_id,
  a.rank,
  a.name,
  a.dist
FROM
  table b
LEFT JOIN (
  SELECT
    b.station_id,
    b.name,
    ST_Distance(c.geom, b.geom) AS dist,
    b.id,
    ROW_NUMBER() OVER (PARTITION BY c.id ORDER BY ST_Distance(c.geom, b.geom) ASC) AS rank
  FROM
    table b,
    table_2 h ) a
USING
  (id)
WHERE
  rank < 4

Vanilla SQL Reference

There are a few vanilla SQL references here that I want to call out as well that will help you as you start to combine more complex analysis:

Joins

There are many different types of joins but they generally follow this syntax:

SELECT
a.column,
b.another_column
FROM a
JOIN b ON a.column = b.column

You can learn more about joins in this post. Specifically related to spatial SQL you will see joins using spatial relationships:

# main query...
JOIN b ON ST_Intersects(a.geom, b.geom)

The reason this works is that the function returns a boolean (true/false) similar to the return value of the equation below:

# main query...
JOIN b ON a.column = b.column

So the join will only complete where those values return true

CREATE TABLE

If you want to create a new table with the results of your query, that is actually quite easy:

CREATE TABLE new_table AS
SELECT * FROM a LIMIT 10

Just replace that simple query with any other query you want!

Common Table Expressions and Subqueries

Common Table Expressions or CTEs allow you to create “temporary tables” that exist only in the context of the query you are running. You can use these to filter or aggregate data at the top of your query chain. You can also chain them together:

WITH a AS (SELECT geom, column FROM a)
SELECT * FROM a

# Or chain multiple together

WITH a AS (SELECT geom, column FROM a),
b as (SELECT geom, column FROM b)
SELECT
a.geom,
b.column
FROM a
JOIN b USING (column)

Subqueries work in a similar fashion but in the context of the query itself:

SELECT 
*
FROM a
WHERE ST_Intersects(geom, (SELECT geom FROM b LIMIT 1))

Add and update columns

It is also easy to add a new column to a table and update the results of that table using a function:

ALTER TABLE table ADD COLUMN centroid GEOMETRY

# To update the new column

UPDATE table SET centroid = ST_Centroid(geom)

Window Functions

Window functions allow you to define an operation over a specific window of data you want to study. You can define the data within this window, the order of the data, and more. My favorite example to illustrate this is a 7 day rolling average:

SELECT 
*,
AVG(value) OVER (PARTITION BY id ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) as rolling_average
FROM table