Geoparsing

The GeoParser extracts geographic locations from text, detecting both coordinates and place names.

Basic Usage

use spatial_narrative::parser::GeoParser;

let parser = GeoParser::new();
let mentions = parser.extract("Meeting at 40.7128, -74.0060");

assert_eq!(mentions.len(), 1);
let loc = mentions[0].location.as_ref().unwrap();
println!("Latitude: {}, Longitude: {}", loc.lat, loc.lon);

Coordinate Formats

The parser detects three coordinate formats:

Decimal Degrees

40.7128, -74.0060
51.5074 -0.1278
-33.8688, 151.2093

Degrees with Symbols

40.7128°N, 74.0060°W
48.8566N 2.3522E
33.8688°S, 151.2093°E

DMS (Degrees, Minutes, Seconds)

40°42'46"N, 74°0'22"W
51°30'26"N, 0°7'40"W

Place Name Resolution

Use a gazetteer to resolve place names to coordinates:

use spatial_narrative::parser::{GeoParser, BuiltinGazetteer, MentionType};

let gazetteer = BuiltinGazetteer::new();
let parser = GeoParser::with_gazetteer(Box::new(gazetteer));

let text = "The conference was held in Tokyo and participants came from London.";
let mentions = parser.extract(text);

for mention in &mentions {
    if matches!(mention.mention_type, MentionType::PlaceName) {
        println!("Place: {}", mention.text);
    }
}

Built-in Gazetteer

The BuiltinGazetteer includes 2500+ world cities:

  • Major world cities selected by population from GeoNames
  • Cities across 150+ countries with precise coordinates
  • Common aliases (NYC → New York City, SHA → Shanghai, etc.)
  • Data licensed under CC BY 4.0
use spatial_narrative::parser::{BuiltinGazetteer, Gazetteer};

let gazetteer = BuiltinGazetteer::new();

// Direct lookup
if let Some(loc) = gazetteer.lookup("Paris") {
    println!("Paris: {}, {}", loc.lat, loc.lon);
}

// Aliases work too
assert!(gazetteer.contains("NYC"));
assert!(gazetteer.contains("New York City"));

Custom Gazetteer

Implement the Gazetteer trait for custom place databases:

use spatial_narrative::parser::Gazetteer;
use spatial_narrative::core::Location;
use std::collections::HashMap;

struct MyGazetteer {
    places: HashMap<String, (f64, f64)>,
}

impl Gazetteer for MyGazetteer {
    fn lookup(&self, name: &str) -> Option<Location> {
        self.places.get(&name.to_lowercase())
            .map(|(lat, lon)| Location::new(*lat, *lon))
    }

    fn contains(&self, name: &str) -> bool {
        self.places.contains_key(&name.to_lowercase())
    }

    fn all_names(&self) -> Vec<&str> {
        self.places.keys().map(|s| s.as_str()).collect()
    }
}

Configuration

Use LocationPattern to control what the parser detects:

use spatial_narrative::parser::{GeoParser, LocationPattern, BuiltinGazetteer};

// Only detect coordinates, not place names
let mut parser = GeoParser::new();
parser.set_pattern(LocationPattern::coordinates_only());

// Or customize fully
let pattern = LocationPattern {
    detect_decimal: true,
    detect_symbols: true,
    detect_dms: false,
    detect_places: false,
    min_confidence: 0.8,
};
parser.set_pattern(pattern);

LocationMention

Each detected location is returned as a LocationMention:

use spatial_narrative::parser::{GeoParser, MentionType};

let parser = GeoParser::new();
let mentions = parser.extract("Coordinates: 40.7128, -74.0060");

for mention in mentions {
    println!("Text: '{}'", mention.text);
    println!("Position: {}-{}", mention.start, mention.end);
    println!("Type: {:?}", mention.mention_type);
    println!("Confidence: {:.2}", mention.confidence);
    
    if let Some(loc) = mention.location {
        println!("Location: {}, {}", loc.lat, loc.lon);
    }
}

MentionType

  • DecimalDegrees - "40.7128, -74.0060"
  • DegreesWithSymbols - "40.7128°N, 74.0060°W"
  • DMS - "40°42'46"N, 74°0'22"W"
  • PlaceName - "Paris", "New York City"

Confidence Scores

Each mention has a confidence score (0.0 to 1.0):

TypeTypical Confidence
DMS0.99
DegreesWithSymbols0.98
DecimalDegrees0.95
PlaceName0.85

Coordinate formats have higher confidence because they're unambiguous. Place names are lower due to potential ambiguity (e.g., "Paris" could be Paris, France or Paris, Texas).