Abstract
Partial least squares regression (PLSR) and support vector regression (SVR) models were optimized for the quantification of U(VI) (10–320 g L−1) and HNO3 (0.6–6 M) by Raman spectroscopy with optimized calibration sets chosen by optimal design of experiments. The designed approach effectively minimized the number of samples in the calibration set for PLSR and SVR by selecting sample concentrations with a quadratic process model, despite complex confounding and covarying spectral features in the spectra. The top PLS2 model resulted in percent root mean square errors of prediction for U(VI), HNO3, and NO3− of 3.7%, 3.6%, and 2.9%, respectively. PLS1 models performed similarly despite modeling an analyte with a majority linear response (i.e., uranyl symmetric stretch) and another with more covarying vibrational modes (i.e., HNO3). Partial least squares (PLS) model loadings and regression coefficients were evaluated to better understand the relationship between weaker Raman bands and covarying spectral features. Support vector machine models outperformed PLS1 models, resulting in percent root mean square error of prediction values for U(VI) and HNO3 of 1.5% and 3.1%, respectively. The optimal nonlinear SVR model was trained using a similar number of samples (11) compared with the PLSR model, even though PLS is a linear modeling approach. The generic D-optimal design presented in this work provides a robust statistical framework for selecting training set samples in disparate two-factor systems. This approach reinforces Raman spectroscopy for the quantification of species relevant to the nuclear fuel cycle and provides a robust chemometric modeling approach to bolster online monitoring in challenging process environments.