Statistical Anomaly Detection: Instant Alerts for Rhine Contamination

Industry
Water & Environmental
Country
Netherlands

Executive Summary

Built a statistical anomaly detection system for a Dutch water consortium to monitor Rhine basin contamination. The system replaced days of manual analysis with instant automated alerts across 40+ discharge facilities and monitoring stations. Operators receive alerts with geospatial correlations and substance classifications, enabling them to investigate and respond quickly. Successfully detected a major PFBS discharge event (27,000 ng/l) and validated 233 facility-monitoring correlations across 24 water bodies.

Challenge

A consortium for Dutch water companies needed to detect contamination in the Rhine before it reached Dutch borders. Their existing workflow relied on manual analysis of complex time-series water quality data from multiple Rhine basin sources. This approach was time-consuming, prone to bottlenecks, and often delayed detection of critical contamination events, potentially putting public water safety at risk.

Approach

The real need wasn't just faster analysis, it was an automated early warning system that could monitor 40+ facilities continuously with minimal false alarms.

Key considerations:

  • Multiple data sources with different formats and frequencies
  • Sparse, inconsistent data requiring robust statistical methods
  • Zero tolerance for false positives as it would erode operator trust
  • Need to correlate events across 158.5km of Rhine basin

Solution

I designed and built a statistical anomaly detection system with the following components:

  • Threshold configurations: 4 different sensitivity levels, allowing operators to tune detection based on their risk tolerance
  • Geospatial correlation engine: Links discharge facilities to downstream monitoring stations across 158.5km of the Rhine basin
  • Substance classification database: Covers 385 chemicals, categorized into PFAS and Non-PFAS groups
  • Automated pipeline: Ingests data from 40 discharge facilities and monitoring stations, processes ad hoc instantly
  • Clear visualizations: Generated informative charts and maps to help operators quickly understand contamination patterns, correlations, and alert priorities

The system was designed to flag anomalies automatically while providing operators with the context they need to investigate and respond.

Outcome

  • Real contamination detected: The system successfully flagged a major PFBS discharge event (27,000 ng/l), a real threat that required immediate attention
  • Analysis time: Reduced from days of manual work to ad hoc instant alerts
  • Validated correlations: 233 facility-monitoring correlations confirmed across 24 water bodies
  • Optimal threshold identified: 1.0σ threshold detected 36 anomalous substances with high reliability

Tech Stack

Python • Pandas • NumPy • Statistical Analysis • Geospatial Mapping • Custom Data Pipeline • PDF Processing • Time-Series Analysis