Halesowen College ยท T Level Data Analytics
Do variables move together? How strongly?
Example: If temperature and humidity are strongly negatively correlated (r = โ0.72), it might mean the sensor heats up and desiccates โ a calibration artefact, not a physical relationship.
| temp_c | hum_pct | pres_hpa | gas_ohm | |
|---|---|---|---|---|
| temp_c | 1.00 | โ0.71 | 0.12 | 0.48 |
| hum_pct | โ0.71 | 1.00 | โ0.09 | โ0.22 |
| pres_hpa | 0.12 | โ0.09 | 1.00 | 0.18 |
| gas_ohm | 0.48 | โ0.22 | 0.18 | 1.00 |
Example correlation matrix from a 24-hour dataset
pip install seaborn โ built on matplotlib, makes beautiful statistical visualisations with minimal code.
Always follow up a correlation with a scatter plot to visually confirm the relationship isn't driven by outliers.
Patterns over time ยท Resampling ยท Rolling averages
Is one location significantly different from another?
Why Mann-Whitney? Our data may not be normally distributed (it won't be โ environmental data rarely is). Mann-Whitney is a non-parametric test that doesn't assume normality.
Box plots are the best visual companion to this test โ they show median, IQR, and outliers side by side for both sites.
The box plot IS your results figure. If the boxes don't overlap significantly, the sites are different. Pair it with the p-value for a complete finding.
A finding is a result with an interpretation. Not just numbers โ statements.
Mean temperature at the car park was 18.4ยฐC and at the courtyard was 19.1ยฐC.
The courtyard recorded mean temperatures 0.7ยฐC higher than the car park (Mann-Whitney U, p = 0.002), consistent with reduced airflow in an enclosed space.
What: the difference.
How much: magnitude.
Confidence: p-value or effect size.
Why: physical interpretation.
Aim for 3โ5 findings in your final report. Each should be one clear, evidenced sentence. They become the backbone of your discussion section and your presentation.
Goal: correlation heatmap + two time-series plots + one written finding.
(8 min) Compute the correlation matrix. Save it as a heatmap. Which pair has the strongest correlation? Is it positive or negative? Write one sentence explaining why this might be.
(10 min) Resample your data to 30-minute means. Plot temperature and IAQ (normalised gas) on the same time axis. Do you see any patterns matching school hours?
(10 min) Combine your dataset with a partner's (different site). Build a box plot comparing temperature between the two locations. Run the Mann-Whitney U test. What is your p-value?
(7 min) Write one complete finding following the structure above: WHAT + HOW MUCH + CONFIDENCE + WHY. This goes directly into your report draft.
๐ก Effect size matters as much as the p-value. A statistically significant but tiny difference (0.1ยฐC) may not be practically meaningful. Always report both the statistical and the practical significance.
Date TBC ยท Data Visualisation
Principles of effective data visualisation, choosing the right chart type for each finding, building a multi-panel summary figure, dashboard design basics, and colour accessibility.
Before Session 07: Have your 3โ5 findings written up. We'll build a figure for each one.
Stretch task: Compute a 7-day rolling average if you have enough data. Do weekly trends differ from daily patterns?
Questions?
jwilliams.science ยท HalesAir Project