Dr James Williams.
GeoAI · Foundation Models · Urban Intelligence
I build geospatial AI systems that operate at scale — from GNN-based urban embedding models spanning 24 cities, to cloud-native conflict data infrastructure indexing 10 million records. My work bridges spatial machine learning with real-world impact in humanitarian, urban, and policy contexts.
About
I build geospatial AI systems at the boundary of spatial computing and machine learning — designing infrastructure that translates large, messy geographic datasets into meaningful representations of places, people, and events.
My work is inherently cross-disciplinary, sitting between GIScience, computer science, and the social sciences. Whether indexing ten million conflict records, embedding urban street networks across 24 global cities, or routing cyclists through 49 fused datasets, the common thread is scale — and the conviction that spatial data, treated carefully, can support better decisions in humanitarian, urban, and policy contexts.
Research Systems
Key Systems Built
MORPHEME
GNN-based embedding model utilising OpenStreetMap data to autonomously quantify urban characteristics — vibrancy, safety, and function — across 24 global cities without manual labelling.
CDISaW
Central Data Infrastructure for Slavery and War — a cloud-native conflict data platform indexing over 10 million records and delivering accessible outputs for interdisciplinary research at the Leverhulme Centre.
Topodex
Cloud-based, LLM-native geocoding system resolving over 894 million places worldwide, incorporating a supervised learning feedback loop for continuous accuracy improvement.
WalkGrid
Cloud-based active travel routing engine fusing 49 datasets — including Earth Observation, Ordnance Survey, OpenStreetMap, and Overture — across a 150,000-cell H3 grid for millisecond personalised route generation at city scale.
Areas of Focus
Research Themes
GeoAI & Representation Learning
Geospatial foundation models, graph neural networks for spatial representation, and discrete global grid systems (H3) for city-scale analysis.
- Spatial GNNs
- Place embeddings
- OSM big-data parsing
- H3 / DGGS
Conflict & Humanitarian Data
Cloud-native infrastructure for conflict documentation and displacement analysis, supporting interdisciplinary research into slavery and war.
- CDISaW data platform
- Topodex geocoding
- Leverhulme Centre
- Rights Lab, Nottingham
Urban Intelligence & Mobility
Data-driven analysis of urban place, active transportation, and mobility patterns using satellite data, crowdsourced trajectories, and civic APIs.
- MORPHEME urban model
- WalkGrid routing
- PARM civic dashboards
Portfolio
Featured Projects
CDISaW
Centralised Data Infrastructure for Slavery and War — a unified query layer over dispersed, heterogeneous datasets on slavery and war across space and time.
View project
PlaceCrafter
A web-based geospatial framework for identifying and visualizing 'platial' functional regions by clustering OpenStreetMap Points of Interest.
View projectScholarship
Selected Publications
Macro-Regional Spatial Patterns of Ambient Air Pollution and Avoidable Hospitalizations for Community-Acquired Pneumonia in Mexico (2013–2020)
C. Hernandez-Nava, M. Mata-Rivera, R. Zagal-Flores, J. Williams
Ambient air pollution significantly contributes to respiratory illnesses, yet little is known about how industrial emissions are linked to preventable hospitalizations across atmospheric basins in middle-income countries. This study develops a basin-based geo-matics framework to examine the spatial and temporal relationship between industrial pollutants and age- and sex-adjusted avoidable hospitalizations for community-acquired pneumonia (PQI 11) in Mexico from 2013 to 2020. Using state-level data grouped into eight macro-regions, we combine bivariate choropleth maps, Pearson correlations, linear regression, and longitudinal time-series analysis to identify spatial clusters of high risk and to estimate regional sensitivities to changes in PM2.5, SO2, NOx, and volatile organic compound emissions. The findings reveal notable regional differences: northern border states and the Mexico City metropolitan basin form persistent high–high clusters where elevated emissions coincide with high PQI 11 rates, while coastal and peninsular regions show lower hospitalization burdens despite medium emission levels. Although national industrial PM2.5 emissions decreased over the study period, several macro-regions—particularly CDMX_Edomex, Centro, and Centro Norte—experienced significant increases in avoidable hospitalizations and decoupled emission–health patterns. Correlation matrices and regression slopes suggest that the strength and even direction of links between pollutants and PQI 11 vary across macro-regions, with emission-responsive patterns in Centro Norte and weak or inverse relationships in Peninsula and Pacifico Sur. These findings demonstrate that national averages obscure critical spatial disparities and highlight the value of basin-based geomatics approaches for regional air-quality governance, spatial decision support, and primary-care planning aimed at reducing preventable respiratory hospitalizations.
AnythingPOI - Australia POI Dataset v0.1
J. Williams
A unified, open point-of-interest (POI) dataset for Australia containing 1,735,980 POIs produced by the AnythingPOI pipeline, which fuses OpenStreetMap and Overture Maps Foundation data using H3-indexed spatial conflation, Jaro-Winkler name matching, and multi-signal confidence scoring. Source breakdown: OSM-only: 320,149 (18.4%) — from OpenStreetMap contributors (ODbL 1.0) Overture-only: 1,364,195 (78.6%) — from Overture Maps Foundation (CDLA-Permissive-2.0) Conflated (both sources matched): 51,636 (3.0%) Top categories: Professional & Business, Retail, Food & Beverage, Transportation, Healthcare. Full taxonomy: 18 Tier-1 categories, 196 Tier-2 subcategories. Contents: GeoParquet files (one per Tier-1 category), PMTiles v3 for interactive map visualisation, and coverage statistics CSVs. Each POI carries a confidence_score (0.01–0.99) reflecting the strength of the conflation evidence across spatial, name, website, phone, postcode, and Wikidata signals. Attribution: This dataset contains information from OpenStreetMap (© OpenStreetMap contributors, ODbL 1.0 — openstreetmap.org/copyright) and Overture Maps Foundation (CDLA-Permissive-2.0 — overturemaps.org). License: Open Database License (ODbL) 1.0. Any public use of this database or works produced from it must include the above attribution. Derivative databases must also be released under ODbL.
AnythingPOI - Canada POI Dataset v0.1
J. Williams
A unified, open point-of-interest (POI) dataset for Canada containing 5,565,256 POIs produced by the AnythingPOI pipeline, which fuses OpenStreetMap and Overture Maps Foundation data using H3-indexed spatial conflation, Jaro-Winkler name matching, and multi-signal confidence scoring. Source breakdown: OSM-only: 451,872 (8.1%) — from OpenStreetMap contributors (ODbL 1.0) Overture-only: 5,037,979 (90.5%) — from Overture Maps Foundation (CDLA-Permissive-2.0) Conflated (both sources matched): 75,405 (1.4%) Top categories: Professional & Business, Retail, Food & Beverage, Healthcare, Services. Full taxonomy: 18 Tier-1 categories, 196 Tier-2 subcategories. Contents: GeoParquet files (one per Tier-1 category), PMTiles v3 for interactive map visualisation, and coverage statistics CSVs. Each POI carries a confidence_score (0.01–0.99) reflecting the strength of the conflation evidence across spatial, name, website, phone, postcode, and Wikidata signals. Attribution: This dataset contains information from OpenStreetMap (© OpenStreetMap contributors, ODbL 1.0 — openstreetmap.org/copyright) and Overture Maps Foundation (CDLA-Permissive-2.0 — overturemaps.org). License: Open Database License (ODbL) 1.0. Any public use of this database or works produced from it must include the above attribution. Derivative databases must also be released under ODbL.
Writing
Latest from the Blog
Get in Touch
Open to research collaboration, grant partnerships, and PhD supervision enquiries.