Skip to main content
3 min read
GeoAI · Intermediate ·

Geospatial Foundation Models

Geospatial foundation models are large pre-trained neural networks adapted to understand and reason about spatial data — from satellite imagery and map tiles to GPS traces and place embeddings.

A geospatial foundation model is a large-scale neural network pre-trained on diverse geographic data that can be fine-tuned for a wide range of downstream spatial tasks — without requiring task-specific training from scratch. They represent the convergence of the foundation model paradigm (popularised by GPT and BERT in NLP) with the unique challenges of geographic information science.

What Makes Geospatial Data Unique

Standard foundation models assume inputs that are either sequential (text, audio) or grid-like (images). Geographic data breaks both assumptions:

  • Irregular topology — spatial relationships don’t follow a regular grid; a point’s neighbours depend on the geometry of roads, rivers, or administrative boundaries
  • Multi-resolution — meaningful patterns exist simultaneously at building, neighbourhood, city, and continental scales
  • Semantic ambiguity — the same location can mean different things depending on context (a park is recreational space, an urban heat sink, and a biodiversity corridor simultaneously)
  • Temporal dynamics — places change; a model trained on 2020 data may be wrong about the same location in 2025

Architectural Approaches

Three main architectural families dominate the field:

Transformer-based encoders adapt BERT-style masked modelling to geospatial tokens — map tiles, H3 cells, or place descriptions — creating dense vector representations of locations. Models like GeoBERT and SatMAE fall into this category.

Graph neural networks are a natural fit for spatial data because geographic relationships are fundamentally graphs (road networks, administrative hierarchies, spatial adjacency). Spatial GNNs learn representations that respect these topological constraints.

Diffusion and generative models are increasingly used for geospatial synthesis — generating plausible building footprints, road networks, or land-use maps from partial observations.

Key Applications

  • Urban analytics — inferring socioeconomic indicators from street imagery or OSM features without ground truth labels
  • Change detection — identifying land-use transitions from multi-temporal satellite imagery
  • Geocoding at scale — resolving ambiguous place references in historical texts to modern coordinates (see Topodex)
  • Route intelligence — building traversability models for active travel that account for safety, amenity, and terrain (see WalkGrid)

The Scale Problem

Pre-training a competitive geospatial foundation model requires enormous datasets: the full OpenStreetMap planet file (~75 GB uncompressed), multi-year Sentinel-2 archives, and billions of GPS traces. This scale creates practical barriers — most geospatial AI research still happens on city-scale or country-scale subsets rather than global data.

Discrete global grid systems like H3 offer one solution: by tessellating the Earth into a fixed hierarchy of hexagonal cells, they allow models to operate at consistent spatial resolutions regardless of geographic location.

Current Limitations

Geospatial foundation models remain an active research frontier. Key open problems include representing spatial uncertainty, handling the domain shift between geographic regions, and producing outputs that are legally and cartographically valid (models hallucinate roads and buildings). Attribution and provenance — critical for humanitarian applications — are also poorly handled.

Research at Nottingham

My work on MORPHEME explores GNN-based urban representations across 24 global cities, treating the problem as a self-supervised representation learning task over OpenStreetMap feature graphs. The goal is a model that generalises urban semantics without relying on labelled training data.

Last updated 24 April 2026

Explore Further

Related Blog Posts