AnythingPOI
Projects
Ongoing 2026 research

AnythingPOI

A fused, confidence-scored point-of-interest dataset combining OpenStreetMap and Overture Maps Foundation data at national scale — published as GeoParquet and PMTiles.

Python PyOsmium DuckDB H3 Shapely Pandas Tippecanoe GeoParquet PMTiles OpenStreetMap Overture Maps Wikidata MapLibre Live Demo
23M+ POIs published
6 countries
503k cross-source matches
ODbL open licence

Point of interest data underpins navigation, urban analysis, accessibility research, commercial intelligence, and geospatial AI training. Yet no single open source covers all POIs comprehensively. OpenStreetMap (OSM) provides community-verified, hand-tagged data with strong geographic fidelity; Overture Maps Foundation supplies commercial-scale coverage with rich attribute depth. Each fills gaps the other leaves. Used naively together, however, they produce duplicates, conflicting records, and uneven confidence — the same café appearing under slightly different names and coordinates in both sources.

AnythingPOI resolves this through a reproducible five-stage pipeline that fuses both sources into a single, deduplicated, confidence-scored national dataset. The first release — Aurora v0.1 (April 2026) — covers six countries and more than 23 million POIs.

Five-Stage Pipeline

1. Ingestion

OSM data is downloaded as compressed PBF files from Geofabrik and parsed with PyOsmium. Only named features are retained; highway, waterway, and landuse geometries are rejected. Polygon features are converted to centroids via Shapely.

Overture data is queried directly from S3 using DuckDB’s httpfs extension, with bounding-box filters pushed to the Parquet layer to avoid unnecessary data transfer.

Both sources are deduplicated by ID and filtered to named features only before any comparison begins.

2. Conflation

All records are indexed to H3 resolution 11 (cells of approximately 650 m²). Candidate pairs are compared within each cell and its six neighbours — reducing comparisons by four to five orders of magnitude relative to brute-force matching.

For each OSM record, an Overture candidate must satisfy three conditions simultaneously:

  • Haversine distance under 50 metres
  • Same Tier-1 category
  • Jaro-Winkler name similarity ≥ 0.85

The highest-scoring candidate above all thresholds is accepted as the match. On attribute merging, geographic fields favour OSM; commercial fields (phone, website, brand) prefer Overture when OSM data is absent.

3. Confidence Scoring

Every POI receives a score between 0.01 and 0.99, anchored at a base of 0.50, with additive adjustments from multiple independent signals:

Boosts — dual-source conflation (+0.05), Wikidata presence (+0.15), website (+0.03), phone (+0.02), street address (+0.02), opening hours (+0.02), OSM polygon geometry (+0.03)

Penalties — digit-only names (−0.15), names ≤ 2 characters (−0.10), URLs used as names (−0.10)

Across Aurora v0.1 the mean and median confidence both sit at 0.73 (IQR 0.68–0.79), with only 0.2% of records scoring below 0.5.

4. Taxonomy

A two-level hierarchy maps OSM tags and Overture category strings to a unified classification:

  • 17 Tier-1 categories (e.g. Food & Beverage, Healthcare, Transport)
  • 196 Tier-2 subcategories
  • 577 OSM mapping rules + 1,539 Overture mapping rules = 2,116 rules total

Features matching no rule are flagged as Other / Uncategorised for iterative refinement between releases.

5. Output Formats

Each country ships in two formats:

GeoParquet — 18 category files (one per Tier-1), Snappy-compressed, sorted by H3 cell, EPSG:4326. 35 columns including geometry, source IDs, full classification path, contact data, and confidence score.

PMTiles v3 — A single archive serving four zoom-band layers: national overview (z4–z6, confidence ≥ 0.70 only), provincial (z7–z10), city-scale (z11–z13), and full street-level data (z14–z16). MapLibre-compatible.

Aurora v0.1 Coverage

CountryPOIsCross-source matchesOSM shareOverture share
Germany6,763,796213,53235.8%61.0%
Canada5,565,25675,4058.1%90.5%
United Kingdom4,622,174122,51720.5%76.8%
Türkiye2,201,30430,89513.9%84.7%
Netherlands1,782,53850,76815.8%81.4%
Australia1,735,98051,63618.4%78.6%

The cross-source match rate varies substantially by country, reflecting OSM contributor density: Germany’s strong OSM community yields a 35.8% OSM share; Canada’s sparser coverage means 90.5% of records come from Overture alone. Confidence scores travel with every record, letting downstream users apply their own quality thresholds.

Licensing

The output datasets are published under the Open Database License (ODbL) 1.0. Overture’s CDLA Permissive 2.0 licence permits incorporation into ODbL-licensed derivative works. Any public use of these databases, or works produced from them, must include attribution to © OpenStreetMap contributors. Derivative databases must also be released under ODbL.

Limitations

Aurora v0.1 is a static snapshot; live sources change continuously. Conservative matching thresholds may miss transliterated names or address-only records. All output is point-based — polygon features are collapsed to centroids. Address completeness varies significantly by region and category, and names appear in their original language without translation.