Abstract
Machine-derived sentiment analysis has become a pervasive and useful tool to address a wide array of issues in natural language processing. Leading technology companies such as Google now provide sentiment analysis tools (SATs) as readily accessible online products. Academic researchers develop and make available SATs to support the research enterprise. One of the major challenges with SATs is the inconsistencies in results among the various SATs. Consequently, the selection of a SAT for a specific purpose may significantly impact the application. This study addresses the foregoing problem by utilizing structural equation modeling to merge the outputs of SATs to develop a combined sentiment metric without the need for a labeled training dataset. This method is applicable to a wide range of text-based problems, is data-driven, and replicable. It was tested using three publicly available datasets and compared against seven different SATs. The results indicate that as a continuous measure, the proposed method outperformed other SATs in the movie reviews and SemEval datasets, and achieved a tie for first place with IBM Watson on the Sentiment 140 dataset. Also, compared to the published major alternatives, the arithmetic mean solution, this approach performed better across these three datasets.