Abstract
With increasing freshwater scarcity, advanced process design mechanisms such as Closed-Circuit Reverse Osmosis (CCRO) and Digital/Physical Twin systems are gaining traction in water treatment and reuse operations. While digital and physical twin models enable improved system insight and control, their development is often expensive and computationally intensive, requiring large volumes of synthetic or experimental data to characterize underlying process dynamics. This work introduces a sparse surrogate modeling framework to estimate power consumption from measured flow and pressure variables, along with their nonlinear polynomial and interaction expansions. To ensure model reliability and reduce overfitting, a two-stage pipeline is proposed. First, a dynamic data filtering algorithm is employed to remove uninformative observations and transient operational states. Second, a sparse penalized regression technique is applied to select a minimal set of parsimonious features. The proposed model achieves high sparsity, retaining only 7 out of 34 candidate features (≈79.41% sparsity) while delivering a root mean square error (RMSE) of 0.072 on the test dataset.