MODERN GIS CERTIFICATION

E1: Cloud-Native Geospatial Files

This brick introduces the essential file formats powering modern, scalable spatial data workflows. 

 

You’ll get hands-on experience with formats like:

 

  • Cloud-Optimized GeoTIFF (COG) for raster data

  • GeoParquet for columnar vector data

Brick E1 – Core Competencies (High-Level)

  1. Cloud Storage & Streaming I/O
    Authenticate to S3-style buckets and read/write objects on-the-fly without full downloads.

  2. Raster → Cloud-Optimized GeoTIFF
    Transform GeoTIFFs into COGs with internal tiling, overviews, and best-practice compression.

  3. Vector → GeoParquet & Spatial Partitioning
    Convert vector data in-memory to GeoParquet (v1.1), apply row-group tuning, and split by spatial key.

  4. Lightweight Validation & CI
    Extract minimal metadata (COG footprint, Parquet schema), write pytest checks, and automate grading.

Lesson Skills

E1 Certified Skills

Full list of skills and tools used during this certification track. Anyone who has the validated badge for this track has used these tools and skills.

– Understand AWS S3 concepts (buckets, objects, prefixes, authentication)
– Configure and connect to an object store (anonymous vs. signed access)

– Use fsspec and obstore (and the s3fs backend) to read and write remote files as if they were local
– Stream data tile-by-tile (no full file download)

– Use Rasterio and rio-cogeo to transform plain GeoTIFFs into COGs
– Tune internal tiling (blockxsize, blockysize) and overviews for optimal HTTP-range reads
– Apply best-practice compression (DEFLATE, ZSTD, etc.) and predictors

– Read Shapefiles (or GeoPackages) in-memory with GeoPandas (and VSIMEM)
– Write out GeoParquet with PyArrow, setting row-group size, compression, and GeoParquet version
– Understand how row-grouping and spatial keys (geohash, Hilbert) impact query performance

– Leverage Rasterio’s MemoryFile (VSIMEM) for zero-scratch-disk conversions
– Use io.BytesIO or Obstore’s file-like readers/writers for pure in-Python pipelines

– Choose appropriate tile sizes (256 × 256 vs. 512 × 512) based on use case
– Select compression codecs (lossless vs. lossy) for data vs. visualization needs
– Partition vector data (quad-tree, k-d tree) for spatial pruning

– Inspect COG metadata and validate with rio_cogeo.cog_validate / cog_info
– Read GeoParquet footers to assert schema, row groups, and GeoParquet metadata
– Write automated tests (pytest) that verify your outputs without downloading entire files

– Package your workflow into a Gitpod/GitHub Codespaces environment
– Automate dependencies (GDAL, rasterio, geopandas, rio-cogeo) via Docker/Gitpod config
– Integrate CI checks that grade your notebook outputs and issue badges

Affordable plans

Flexible pricing to suit every team size

Choose from scalable pricing options that grow with your team’s needs and budget.

🔥 START E1 BRICK 🔥

E1 Brick

Learn the complete track and how to interweave skills in a Capstone seminar

$ 49
  • Full E1 Brick content
  • Certification badge on completion
  • Cloud resources and code

Full Brick Track

Learn the complete track and how to interweave skills in a Capstone seminar

$ 149
  • Access to all 4 E-Track Bricks
  • Certifications for each Brick
  • Capstone seminar for the track
  • Capstone certifiation
Modern GIS

Spatial Lab Membership

Get full access to additional tracks, community resources, and live cohort events.

$ 19 /month
  • Access to the E-Track and other upcoming tracks
  • Community supported learning with other learners
  • Access to live events for this track
  • Much more...