Named Entity Recognition

The TextAnalyzer provides basic Named Entity Recognition (NER) for extracting structured information from narrative text.

Basic Usage

use spatial_narrative::text::TextAnalyzer;

let analyzer = TextAnalyzer::new();
let text = "Dr. Smith met with CEO Johnson at Google headquarters.";
let entities = analyzer.entities(text);

for entity in entities {
    println!("{:?}: {}", entity.entity_type, entity.text);
}

Entity Types

The analyzer detects six entity types:

TypeDescriptionExamples
PersonPeople's names"Dr. Smith", "Mr. Johnson"
OrganizationCompanies, institutions"Google Inc.", "MIT", "NASA"
LocationPlace names"New York", "Mount Everest"
DateTimeDates and times"January 15, 2024", "March 2023"
NumericNumbers with units"$5.5 million", "100 km"
EventNamed events(custom additions)
OtherUnclassified entities-

Person Detection

Detects names with common titles:

use spatial_narrative::text::{TextAnalyzer, EntityType};

let analyzer = TextAnalyzer::new();
let text = "Dr. Jane Smith and Prof. Bob Johnson attended the meeting.";
let entities = analyzer.entities(text);

let people: Vec<_> = entities.iter()
    .filter(|e| matches!(e.entity_type, EntityType::Person))
    .collect();

for person in people {
    println!("Person: {}", person.text);
}

Recognized titles: Dr., Mr., Mrs., Ms., Miss, Prof., Professor, President, Senator, Governor, Mayor, Chief, Captain, General, Admiral, etc.

Organization Detection

Detects organizations by suffix patterns:

use spatial_narrative::text::{TextAnalyzer, EntityType};

let analyzer = TextAnalyzer::new();
let text = "Apple Inc. partnered with MIT and the World Health Organization.";
let entities = analyzer.entities(text);

let orgs: Vec<_> = entities.iter()
    .filter(|e| matches!(e.entity_type, EntityType::Organization))
    .collect();
// Found: "Apple Inc.", "MIT", "World Health Organization"

Recognized patterns:

  • Suffixes: Inc., Corp., LLC, Ltd., Co., Foundation, Institute, University, Organization
  • Acronyms: NASA, FBI, CIA, NATO, WHO, UN, EU, etc.

Date Detection

Extracts various date formats:

use spatial_narrative::text::{TextAnalyzer, EntityType};

let analyzer = TextAnalyzer::new();
let text = "The event on January 15, 2024 was rescheduled to March 2024.";
let entities = analyzer.entities(text);

let dates: Vec<_> = entities.iter()
    .filter(|e| matches!(e.entity_type, EntityType::DateTime))
    .collect();

Recognized formats:

  • Full dates: "January 15, 2024", "15 January 2024"
  • Month-year: "March 2024", "Jan 2024"
  • Abbreviated months: "Jan", "Feb", "Mar", etc.

Location Detection

Detects common location patterns and major places:

use spatial_narrative::text::{TextAnalyzer, EntityType};

let analyzer = TextAnalyzer::new();
let text = "The company has offices in New York, London, and Tokyo.";
let entities = analyzer.entities(text);

let locations: Vec<_> = entities.iter()
    .filter(|e| matches!(e.entity_type, EntityType::Location))
    .collect();

Built-in location database includes major world cities and countries.

Numeric Detection

Extracts numbers with units:

use spatial_narrative::text::{TextAnalyzer, EntityType};

let analyzer = TextAnalyzer::new();
let text = "The project cost $5.5 million and covered 100 kilometers.";
let entities = analyzer.entities(text);

let numerics: Vec<_> = entities.iter()
    .filter(|e| matches!(e.entity_type, EntityType::Numeric))
    .collect();
// Found: "$5.5 million", "100 kilometers"

Recognized patterns:

  • Currency: "$5.5 million", "€100"
  • Distance: "100 km", "50 miles"
  • Percentages: "25%", "75 percent"
  • General: "1,000", "3.14"

Tokenization

The analyzer also provides text tokenization:

use spatial_narrative::text::TextAnalyzer;

let analyzer = TextAnalyzer::new();
let text = "Hello, world! This is a test.";

// All tokens
let tokens = analyzer.tokenize(text);
// ["Hello", ",", "world", "!", "This", "is", "a", "test", "."]

// Words only (no punctuation)
let words = analyzer.tokenize_words(text);
// ["Hello", "world", "This", "is", "a", "test"]

Sentence Splitting

Split text into sentences:

use spatial_narrative::text::TextAnalyzer;

let analyzer = TextAnalyzer::new();
let text = "First sentence. Second sentence! Third sentence?";
let sentences = analyzer.sentences(text);

assert_eq!(sentences.len(), 3);

Custom Locations

Add custom location names:

use spatial_narrative::text::TextAnalyzer;

let mut analyzer = TextAnalyzer::new();
analyzer.add_location("Springfield");
analyzer.add_location("Gotham City");

let text = "The hero saved Gotham City.";
let entities = analyzer.entities(text);
// Now detects "Gotham City" as a location

Confidence Scores

Each entity has a confidence score:

use spatial_narrative::text::TextAnalyzer;

let analyzer = TextAnalyzer::new();
let entities = analyzer.entities("Dr. Smith works at NASA.");

for entity in entities {
    println!("{}: {} (confidence: {:.2})", 
        entity.entity_type, 
        entity.text, 
        entity.confidence
    );
}

Confidence levels:

  • 0.9+: High confidence (clear patterns like "Dr. Smith")
  • 0.7-0.9: Medium confidence
  • 0.5-0.7: Lower confidence (may need verification)

Limitations

This is a rule-based NER system, not a machine learning model:

  • ✅ Fast and deterministic
  • ✅ No external dependencies
  • ✅ Works offline
  • ❌ May miss unconventional patterns
  • ❌ Limited to English
  • ❌ No context-aware disambiguation

For production NLP tasks requiring high accuracy, consider integrating with external NLP services or ML models.