Halesowen College · T Level Data Analytics · Friday 6 March 2026
Research Fellow in Slavery and War: Geospatial & Data Science
Rights Lab & Leverhulme Centre
University of Nottingham
Previously Lecturer in Computer Science,
Birmingham Newman University
jwilliams.science
jwilliams.science
Geospatial data science · IoT · Human mobility · Place-based sensing · Modern slavery mapping
STEM Partner — here to support your project. You own the work; I help you do it well.
I'll join sessions, answer questions, and help you apply real data-science methods to your findings.
Composition · Pollutants · Standards · Health
Clean, dry air at sea level — by volume:
Air quality describes the degree to which the air is free from pollutants that could harm human health, ecosystems, or the built environment.
Indoor vs outdoor: Indoor air can be 2–5× more polluted than outdoor — VOCs from cleaning products, etc.
Think about your journey to college this morning.
(5 minutes) Write down 3 things you walked or drove past that might be producing air pollution. (Examples: buses, factories, car parks, busy junctions, construction.)
(5 minutes) Compare your list with the person next to you. Discuss: Did you identify the same sources? Were any surprising?
💡 We'll revisit your list when we look at monitoring site types later in the session.
Particulate matter — dust, soot, tyre wear, brake particles. PM2.5 (<2.5µm) bypasses the body's defences and lodges deep in the lungs. Linked to 29,000 early deaths/year in UK.
Produced by diesel engines and gas boilers. Inflames airways and reduces lung function. Highest near junctions — traffic idling is the main source.
Released by solvents, paints, cleaning products, cooking, and vehicle exhausts. React with NOx in sunlight to form ground-level ozone. Our BME680 detects these.
Not emitted directly — formed from VOCs + NOx in sunlight. Higher in summer. Damages lung tissue even at moderate concentrations. Rural areas often worse than city centres.
UK uses DAQI — Daily Air Quality Index — scale 1 to 10
Annual mean limits — revised downward significantly from 2005 guidelines
Birmingham's city centre regularly exceeded WHO NO₂ limits before the Clean Air Zone launched in 2021.
Open Google Maps (or use your local knowledge) and look at the area around Halesowen College.
(5 minutes) Find: the nearest main road, a green or open area, and the main car park. Mark or note each one.
(5 minutes) Rank these three locations from worst to best air quality. Write a one-sentence justification for your top choice.
(5 minutes) Share back with the room: What time of day do you think pollution peaks at the worst location? Why?
💡 There are no wrong answers here — we're building hypotheses we'll test with real data over the coming months.
Hardware · Capabilities · Limitations
Waveshare BME680 module — one board, four environmental measurements
−40 to 85°C range · ±1.0°C accuracy
Can detect heat islands, south-facing walls, ventilation hotspots
0–100%RH · ±3%RH accuracy
High humidity amplifies perceived air quality impact; dampness promotes mould VOCs
300–1100 hPa · ±0.6 hPa accuracy
Low pressure = slower dispersion of pollutants; useful context for other readings
Bosch IAQ index via BSEC library
Detects ethanol, acetone, CO₂-equivalent, and other combustion gases relatively
Important limitation: We do not directly measure PM2.5, PM10, NO₂, or O₃ — but VOC index combined with temperature and humidity still reveals meaningful local variation and patterns.
Understanding the limitations makes your methodology stronger, not weaker.
The BME680 heats up slightly during VOC measurement. Readings should be corrected for self-heating. Placement matters — avoid direct sun.
The IAQ index is comparative — it tells you if air is getting better or worse, not the exact ppb of a specific gas. Useful for trends and comparisons.
VOC sensors need a "burn-in" period (several days) to stabilise. Early readings may vary. We'll account for this in data cleaning (Session 05).
Research context: Low-cost sensor networks are a growing area of environmental science. Papers from the Urban Observatory (Newcastle) and OpenAQ show they're genuinely useful when calibrated and interpreted carefully.
Further reading: Snyder et al. (2013) "The changing paradigm of air pollution monitoring" — Environmental Science & Technology. Lewis & Edwards (2016) in Nature on sensor networks.
Questions · Locations · Methodology
Data analytics relies on rigorous scientific methodology.
Notice a phenomenon (e.g., smog near a road) and ask a specific, testable question about it. This drives the whole project.
Propose a measurable expected outcome based on existing knowledge (e.g., "VOCs will peak during morning drop-off").
Deploy sensors with a controlled methodology. Consistency is key — if your methodology changes halfway, your data is invalid.
Process the data, look for statistical significance, and evaluate if it supports or refutes your hypothesis.
The difference between "just numbers" and "data-driven evidence" is the rigorous methodology that produced those numbers.
A good research question is specific, measurable, and answerable with the data you can actually collect.
"Is the air quality bad at our college?"
"How does the weather affect the air?"
"Is it worse near the road?"
"Does VOC index differ significantly between the car park and the main courtyard during peak arrival times (08:30–09:00)?"
"How does relative humidity vary between the north and south faces of the main building across a full week?"
"Is there a correlation between temperature and IAQ index at the main entrance between 11:00 and 14:00?"
Draft your own research question for the HalesAir project.
(5 minutes) Use this template as a starting point:
"Does [measurement variable] differ between [location A] and [location B] during [specific time or condition]?"
(5 minutes) Share with a neighbour. Check your question against criteria (variable? testable?) and give feedback.
💡 Your question will evolve as you collect data — that's normal. Getting a clear starting hypothesis is what matters today.
The most valuable datasets come from contrast — a car park vs. a courtyard vs. a green space will tell us far more than three units in similar spots.
DEFRA classification — used to contextualise and compare datasets
Within 1m of a busy road. Captures highest traffic-related pollution. Directly comparable to DEFRA roadside reference data. Best for vehicles-as-source studies.
1–5m from traffic. Captures emissions with some dispersion already occurring. Most comparable to pedestrian exposure at crossings and bus stops.
Away from direct emission sources — rooftops, internal courtyards, parks. Represents the general "background" level for the area. Useful as a baseline site.
Corridors, covered walkways, building entrances. Captures people-generated pollutants (VOCs from deodorant, cleaning products, cooking) and infiltration from outdoors.
Further reading: DEFRA "Local Air Quality Management Technical Guidance" (TG22) — free to download. Explains site classification in detail.
Evaluate two potential monitoring sites.
Pick two locations you're considering. For each, score 1–3 on: pollution exposure · power & Wi-Fi access · scientific contrast with other sites.
Using the DEFRA classification from the previous slide — which site type does each location best match? Write it next to each location.
Which location scores higher overall? That's your leading candidate — you'll confirm it during the main task.
Working individually — produce three outputs before the end of the session.
Choose your monitoring site. Be precise — not "outside" but "beside the north car park entrance, under the covered walkway". Note its DEFRA site type.
Write your research question. Use the template from Activity C. Check it names a variable, two comparable locations or times, and is answerable with the BME680.
Sketch your deployment plan. A hand-drawn diagram showing: where the unit sits, where the power socket is, approximate Wi-Fi coverage, and how the enclosure is mounted.
💡 These three outputs will form part of your project methodology write-up. Keep them — you'll build on them in every session from here.
It's time to test your methodology against peer review from the rest of the room.
Where is your site, and why did you pick it?
What is your specific research question?
What potential issues have you identified regarding power, security, or Wi-Fi?
Date TBC · Hardware & Electronics
Getting your hands on the Raspberry Pi Pico W and BME680. Breadboarding your first circuit, writing your first MicroPython to read live sensor data, and understanding GPIO & I2C communication.
To bring next time: Your site choice, research question, and deployment sketch from today.
Optional reading: MicroPython docs at micropython.org — "Quick reference for the RP2040"
Questions? Get in touch:
jwilliams.science · HalesAir Project