Uncategorized

AI-Powered Crop Classification with NASA, IBM, and Hugging Face: A Complete Beginner Tutorial

Foundation models are changing what’s possible in geospatial analysis. Tasks that once required thousands of labeled samples and expensive GPU training can now be performed instantly using pre-trained models built on massive Earth observation datasets.

One of the best examples is the new NASA–IBM Prithvi model, hosted on Hugging Face. It can classify cropland types, detect burn scars, map floods, and perform other remote-sensing tasks without requiring you to train anything yourself.

In this tutorial-style article, we walk through:

  • What Earth observation foundation models are
  • How the NASA–IBM Prithvi model works
  • How to download and prepare multi-temporal satellite data
  • How to structure the data for inference
  • How to use QGIS to combine bands and create the required 18-band GeoTIFF
  • How to run crop classification directly through Hugging Face
  • What the outputs look like and what to do next

By the end, you’ll understand how to run state-of-the-art geospatial AI on real satellite imagery with no training data and no GPUs.


What Foundation Models Mean for Geospatial Machine Learning

Training a deep learning model for imagery classification usually requires:

  • Large labeled datasets
  • Significant compute (often GPUs)
  • Expertise in model architectures
  • Time to tune, evaluate, and iterate

NASA and IBM have already done that work for you. They trained a multi-temporal Transformer-based foundation model using massive amounts of Landsat and Sentinel-2 imagery, creating a pretrained representation of Earth observation patterns.

You get the benefits:

  • No need to gather thousands of training samples
  • No need to rent GPU compute
  • No need to design your own deep learning model
  • Immediate access to highly accurate predictions

Your only job is to prepare the imagery in the correct format and run inference.


How the NASA–IBM Prithvi Crop Classification Model Works

The model takes a multi-temporal stack of imagery—three snapshots in time—with six spectral bands per snapshot:

  • Blue (B02)
  • Green (B03)
  • Red (B04)
  • Narrow NIR (B8A)
  • SWIR 1 (B11)
  • SWIR 2 (B12)

That means you must supply an 18-band raster in the correct order:

[B02, B03, B04, B8A, B11, B12]  x  3 time periods

The model then predicts crop types such as:

  • Corn
  • Soybeans
  • Wheat
  • Alfalfa
  • Cotton
  • Natural vegetation
  • Wetlands
  • Urban or developed areas

Hugging Face provides a clean user interface where you simply upload the stacked GeoTIFF and get back a color-coded classification map.


Step 1: Download NASA HLS Sentinel-2 Imagery

To run the model, you need Harmonized Landsat–Sentinel (HLS) S2 surface reflectance data. You can download it via:

https://search.earthdata.nasa.gov

In the video, the tutorial focused on cropland areas in southern Minnesota. After drawing a bounding box, you filter for:

HLS Sentinel-2 MSI Surface Reflectance (30m), Version 2

Then choose three cloud-free scenes during the growing season—for example:

  • July (DOY 194)
  • August (DOY 224)
  • September (DOY 264)

For each date, download the six required bands:

  • B02 (blue)
  • B03 (green)
  • B04 (red)
  • B8A (narrow NIR)
  • B11 (SWIR 1)
  • B12 (SWIR 2)

These come as six individual GeoTIFFs per date.


Step 2: Organize and Stack Bands in QGIS

Once you have the 18 total band files, load them into QGIS.

Merge all bands into one 18-band GeoTIFF

  1. Go to Raster → Miscellaneous → Merge.
  2. Select all 18 files.
  3. Ensure “Place each input file into a separate band” is checked.
  4. Make sure bands are ordered correctly. Drag B8A above B11 when needed.
  5. Run the merge.

QGIS creates a temporary 18-band raster.

Clip to a small area

Because the Hugging Face model runs on shared compute, large files can take a long time.

Use Raster → Extraction → Clip by Extent and define a tight bounding box around your area of interest.

Export the clipped version as a GeoTIFF. This final file is the one you upload to the model.


Step 3: Run Crop Classification on Hugging Face

Open the multi-temporal classification model on Hugging Face, select the “Spaces” interface, and upload your GeoTIFF.

Click Submit.

The interface shows:

  • Time 1 input
  • Time 2 input
  • Time 3 input
  • The final predicted crop classification map

Processing time varies depending on Hugging Face’s backend. Smaller clips typically finish in a few minutes.

The output preview includes a legend to help identify crop types.

One limitation: Hugging Face currently returns PNG previews, not GeoTIFFs. So the export is not georeferenced.

For real analysis, you’ll want to replicate the inference using Python so that you can write out spatially referenced rasters. (A follow-up tutorial can cover this.)


What the Results Look Like

In the Minnesota example, the model correctly identified:

  • Large areas of corn (yellow)
  • Extensive soybeans (dark green)
  • Pockets of alfalfa
  • Wetlands along rivers
  • Water bodies
  • Developed urban areas

Even without tuning or custom training data, the model produced highly realistic crop maps.

This demonstrates why foundation models are becoming essential for Earth observation tasks.


Scaling Up: What If You Want to Classify a Whole State?

The Hugging Face UI is great for learning, but not built for production-scale workloads.

If you want to:

  • Process states or countries
  • Run hundreds of time windows
  • Integrate classification into pipelines
  • Scale distributed inference

You’ll want a cloud processing engine.

In the video, you noted that Wherobots supports foundation model inference at large scale. A follow-up tutorial will cover how to run multi-temporal classification across massive areas using distributed compute.


Key Skills You’ll Learn from This Tutorial

By following this workflow, you gain:

  • A working understanding of Earth observation foundation models
  • Experience navigating NASA Earthdata and downloading HLS Sentinel-2 imagery
  • Practical QGIS skills for stacking and clipping multi-band rasters
  • Knowledge of how Hugging Face hosts and runs geospatial AI models
  • An introduction to multi-temporal analysis techniques
  • A launch point for more advanced geospatial AI workflows

This is the bridge between traditional remote sensing and modern AI-driven analytics.


What to Try Next

If you want to extend this project, here are practical next steps:

  • Run the model across multiple counties to compare crop rotations
  • Export the input rasters and run model inference with Python for georeferenced output
  • Use the results as training data for your own lightweight classifier
  • Explore other Prithvi foundation models (burn scars, flooding, downscaling)
  • Integrate results into a vector or PostgreSQL/PostGIS workflow