Performance

Performance characteristics and optimization tips.

Complexity

Indexing Operations

OperationSpatialIndexTemporalIndexSpatiotemporalIndex
InsertO(log n)O(log n)O(log n)
Build (bulk)O(n log n)O(n log n)O(n log n)
Point queryO(log n)O(log n)O(log n)
Range queryO(log n + k)O(log n + k)O(log n + k)
K-nearestO(k log n)N/AN/A

Where n = number of items, k = number of results.

Graph Operations

OperationComplexity
Add nodeO(1)
Add edgeO(1)
Get neighborsO(degree)
Has path (BFS)O(V + E)
Shortest pathO((V + E) log V)
connect_temporal()O(n log n)
connect_spatial(d)O(n²)
connect_thematic()O(n² × tags)

Where V = vertices, E = edges.

Memory Usage

Per-Event Overhead

ComponentApproximate Size
Event base~200 bytes
Location~40 bytes
Timestamp~32 bytes
EventId (UUID)16 bytes
Text (per char)1 byte
Tag (each)~24 bytes + string

Index Overhead

IndexOverhead per item
SpatialIndex~64 bytes
TemporalIndex~48 bytes
SpatiotemporalIndex~112 bytes

Approximate Memory Formula

Memory (MB) ≈ events × (200 + avg_text_len + tags × 30) / 1_000_000
           + index_overhead

Optimization Tips

Bulk Loading

Use from_iter instead of repeated insert:

// ✅ Fast: O(n log n) bulk load
let index = SpatialIndex::from_iter(events, |e| &e.location);

// ❌ Slow: O(n log n) per insert
let mut index = SpatialIndex::new();
for event in events {
    index.insert(event, &event.location);
}

Query Then Filter

Use indexes for initial filtering, then apply additional criteria:

// Get spatial candidates (fast)
let candidates = spatial_index.query_bounds(&bounds);

// Apply additional filters (linear)
let filtered: Vec<_> = candidates.into_iter()
    .filter(|e| e.has_tag("important"))
    .filter(|e| time_range.contains(&e.timestamp))
    .collect();

Avoid O(n²) Operations

The connect_spatial and connect_thematic methods are O(n²):

// For large graphs, consider:

// 1. Sample events
let sample: Vec<_> = events.choose_multiple(&mut rng, 1000).collect();
let mut graph = NarrativeGraph::from_events(sample);
graph.connect_spatial(10.0);  // Now O(sample²)

// 2. Use spatial index for proximity
let index = SpatialIndex::from_iter(events.clone(), |e| &e.location);
for event in &events {
    let nearby = index.query_radius_meters(event.location.lat, event.location.lon, 1000.0);
    // Only connect to nearby events
}

// 3. Skip if graph is too large
if events.len() > 10_000 {
    println!("Skipping spatial connections (too large)");
} else {
    graph.connect_spatial(10.0);
}

Lazy Evaluation

Build indexes only when needed:

struct LazyNarrative {
    events: Vec<Event>,
    spatial_index: Option<SpatialIndex<Event>>,
}

impl LazyNarrative {
    fn get_spatial_index(&mut self) -> &SpatialIndex<Event> {
        if self.spatial_index.is_none() {
            self.spatial_index = Some(SpatialIndex::from_iter(
                self.events.iter().cloned(),
                |e| &e.location
            ));
        }
        self.spatial_index.as_ref().unwrap()
    }
}

Parallel Processing

Use rayon for parallel operations:

use rayon::prelude::*;

// Parallel filtering
let filtered: Vec<_> = events.par_iter()
    .filter(|e| expensive_check(e))
    .collect();

// Parallel analysis
let metrics: Vec<_> = narratives.par_iter()
    .map(|n| SpatialMetrics::from_events(&n.events))
    .collect();

Benchmarks

Index Query Performance

Tested on 100,000 events:

Query TypeTime
Spatial bbox (1% area)~50 μs
Spatial radius (1km)~80 μs
Temporal range (1 month)~40 μs
Combined (1% × 1 month)~100 μs
K-nearest (k=10)~120 μs

Graph Operations

Operation1K events10K events100K events
from_events1 ms10 ms100 ms
connect_temporal2 ms25 ms300 ms
connect_spatial(10km)50 ms5 stoo slow
shortest_path< 1 ms5 ms50 ms
to_dot5 ms50 ms500 ms

I/O Performance

FormatExport 10K eventsImport 10K events
JSON15 ms20 ms
GeoJSON20 ms25 ms
CSV10 ms15 ms

Memory Optimization

String Interning

For repeated tags, consider interning:

use std::collections::HashSet;
use std::sync::Arc;

// Share common strings
let tag_pool: HashSet<Arc<str>> = HashSet::new();

fn intern_tag(pool: &mut HashSet<Arc<str>>, tag: &str) -> Arc<str> {
    if let Some(existing) = pool.get(tag) {
        existing.clone()
    } else {
        let new = Arc::from(tag);
        pool.insert(new.clone());
        new
    }
}

Streaming Processing

For very large datasets, process in chunks:

use std::io::{BufRead, BufReader};

let file = File::open("huge_file.csv")?;
let reader = BufReader::new(file);

for chunk in reader.lines().chunks(10_000) {
    let events: Vec<Event> = chunk
        .filter_map(|line| parse_event(&line.ok()?).ok())
        .collect();
    
    // Process chunk
    process_events(&events);
}