Machine Learning–Driven Carbon Monoxide Prediction: A Case Study Using the UCI Air Quality Dataset

Publication Date : 01-07-2025

Author(s) :

Manzur Ashraf, Kishan Moradiya.

Volume/Issue :

Volume 1

Issue 1

(07 - 2025)

Abstract :

Accurate prediction of air pollution is essential for mitigating its adverse effects on human health, particularly with respect to carbon monoxide (CO) exposure. This paper presents a machine learning–based approach for forecasting CO concentration using the UCI Air Quality dataset, which consists of hourly sensor measurements collected from an urban area in Italy. Multiple regression models—including Linear Regression, Decision Trees, Random Forest, and Gradient Boosted Trees (GBT)—were implemented and systematically evaluated. To capture diurnal variation in pollution levels, a temporal feature (Hour) was extracted from timestamp data and incorporated into the models. All preprocessing, feature engineering, and model development were conducted using the KNIME Analytics Platform. Experimental results demonstrate that GBT augmented with the Hour feature achieved the highest predictive accuracy, with an R² score of 0.921, while Random Forest performed poorly on this dataset. A comparative analysis with prior studies based on Delhi air quality data highlights the dataset-dependent nature of model performance. The findings underscore the importance of rigorous data preprocessing and temporal feature engineering in improving air pollution prediction accuracy.

The Australian Journal Of Artificial Intelligence Review