Due to the growing amount of data from in-situ sensors in environmental monitoring, it becomes necessary to automatically detect anomalous data points, in view of removal, further investigation, or corrective actions.
Nowadays, automated anomaly detection for environmental applications is mainly performed using supervised machine learning models, which need a fully labelled data set for their training process. These models may also require frequent updates as environmental data present seasonality changes.
However, the process of labelling data for model training and updating typically requires visual inspection by an expert in the domain of application. This task is typically cumbersome and, as a result, a hindrance to the adoption of machine learning methods for automated anomaly detection.
In this work, we propose to address this challenge by means of active learning. This method consists of querying the domain expert for the labels of only a subset of the full data set. We show that this reduces the time and costs associated to labelling while delivering the same or similar anomaly detection performances. An interesting observation is that the active learning strategy favours the selection of anomalous data samples for querying. Finally, we also show that flexible machine learning models are to be recommended for anomaly detection in complex environmental data sets.