Skip to main content
/ 5 min read

Introducing GEON: A Semantic Format for LLM-Native Spatial Intelligence

A new text-based geospatial notation that bridges machine-optimised GIS formats and human-readable place descriptions for large language model reasoning

Featured image for Introducing GEON: A Semantic Format for LLM-Native Spatial Intelligence - A new text-based geospatial notation that bridges machine-optimised GIS formats and human-readable place descriptions for large language model reasoning

"Coordinate arrays tell an LLM where a place is, but nothing about what it is, how it feels, or how people experience it. GEON is a new text-based format that encodes semantic richness alongside geometry — designed for human comprehension and LLM reasoning."

I have published a new preprint on TechRxiv introducing GEON (Geospatial Experience-Oriented Notation) — a text-based format designed to bridge the gap between machine-optimised geospatial data and the kind of rich, semantic place descriptions that large language models (LLMs) can reason about effectively.

The Problem

Established geospatial formats — GeoJSON, Well-Known Text (WKT), CityGML — are optimised for geometric computation and rendering in GIS tools like QGIS and ArcGIS. They are excellent at what they do. But when passed to an LLM, they present a fundamental mismatch:

  • Semantic opacity: coordinate arrays provide no meaning about what a place is or how it functions
  • Relational isolation: spatial relationships between places are implicit, requiring computational geometry to extract
  • Absence of experience: the human experience of a place is left entirely to the LLM to infer
  • Fragmentation of context: data arrives in different schemas, from different systems

Consider a GeoJSON representation of Victoria Square in Birmingham:

{
  "type": "Feature",
  "geometry": { "type": "Point", "coordinates": [-1.8984, 52.4774] },
  "properties": { "name": "Victoria Square" }
}

This tells an LLM where Victoria Square is — but nothing about what it is, how it feels, what surrounds it, or how thousands of people experience it every day.

What GEON Looks Like

GEON encodes identity, geometry, purpose, experiential qualities, spatial relationships, temporal patterns, and data provenance in a readable, indentation-based syntax:

PLACE: Victoria Square, Birmingham
TYPE: public_space
LOCATION: 52.4791, -1.9024
PURPOSE:
  - civic gathering
  - events and festivals
EXPERIENCE:
  openness: high
  sense_of_safety: high (daytime), moderate (nighttime)
  activity_density: moderate
ADJACENCIES:
  - Council House (west)
  - Birmingham Museum & Art Gallery (north)

The format is intentionally human-readable. An urban planner, architect, or community researcher should be able to read and write GEON without GIS training. At the same time, it is structured enough for reliable parsing and round-trip conversion to and from GeoJSON.

Design Principles

GEON is guided by seven principles:

  1. Semantic richness over geometric precision — a few metres of positional uncertainty is acceptable when accompanied by rich place description
  2. Human readability — comprehensible without GIS expertise
  3. LLM accessibility — field names are self-explanatory, syntactic overhead is minimised
  4. Interoperability — bidirectional conversion with GeoJSON is fully supported
  5. Hierarchical flexibility — places contain sub-places (a market contains stalls; a building contains floors)
  6. Temporal awareness — footfall varies by hour, character shifts across seasons
  7. Experiential grounding — openness, noise, safety, pace are encoded using standardised ordinal scales

The Specification

GEON defines over 30 fields across six categories:

CategoryKey Fields
Identityplace, type, id
Geometrylocation, boundary, extent, area
Semanticpurpose, experience, character
Relationaladjacencies, connectivity, contains, viewsheds
Temporaltemporal, lifespan
Provenancesource, confidence, updated

The experience field encodes 13 phenomenological dimensions — openness, enclosure, noise level, social diversity, sense of safety, and more — each with a standardised ordinal scale grounded in environmental psychology research (Lynch, Gehl, Hillier & Hanson).

Token Efficiency

One practical concern when working with LLMs is context window usage. To evaluate GEON against existing formats, I measured token counts for an identical place description (Birmingham Bullring Markets, with 16 semantic fields and 38 extractable facts) across five formats:

FormatTokensStructured?
GEON536
GeoJSON (pretty)667
GeoJSON (compact)475✅ (unreadable)
WKT + metadata436Partial
Natural language436

GEON uses 20% fewer tokens than pretty-printed GeoJSON for identical semantic content. The advantage comes from eliminating JSON’s syntactic overhead (braces, brackets, quotes around every key, trailing commas) — the indentation-based hierarchy carries equivalent structural information at lower token cost.

Semantic Density

Token count alone is insufficient — what matters is how much meaning each token carries. Measuring semantic density as extractable facts per 100 tokens across formats:

FormatFacts/100 tokensLossless?
GEON8.8
GeoJSON6.7
OSM tags16.7
CSV13.2

GEON achieves 31% higher semantic density than GeoJSON while remaining fully lossless. OSM tags and CSV achieve higher raw density but cannot encode experiential qualities, temporal patterns, or provenance metadata.

LLM Validation

To validate GEON’s practical effectiveness, I prompted two LLMs (NVIDIA Nemotron Nano 9B V2 and Grok 4.1 Fast) to reason about Victoria Square from a GEON document. Both models synthesised the experiential qualities, spatial adjacencies, and temporal safety variation into coherent place narratives — accurately interpreting ordinal scales (“high openness”, “moderate nighttime safety”) as meaningful place characteristics rather than abstract data fields.

The Nemotron model described Victoria Square as “a key social and cultural anchor in the city”; Grok characterised it as “a communal heartbeat for locals and visitors alike”. Both responses integrated every semantic dimension of the GEON input.

Reference Implementations

GEON is released with reference implementations in three languages:

  • Python — for data science and scripting (geon-py)
  • Rust — for high-performance systems (geon-rs)
  • JavaScript/TypeScript — for web applications (geon-js)

All implementations share the same data model and support the full specification, including parsing, generation, validation (with three severity levels), and bidirectional GeoJSON conversion.

Read the Preprint

The full manuscript — including the complete format specification, controlled vocabularies, evaluation methodology, and use cases — is available on TechRxiv:

DOI: 10.36227/techrxiv.177160645.54245440/v1

The specification, implementations, interactive web toolkit, and all evaluation data are available under open licences (CC BY 4.0 / MIT) at:

Repository: github.com/jwilliamsresearch/geon
Demonstrator: jwilliams.science/GEON

James Williams
Dr James Williams
Research Fellow

Researching the intersection of place, maps, and technology.

More about me →