Skip to main content

Smart Semi-Supervised Accumulation of Large Repositories for Industrial Control Systems Device Information...

by Kimia Ameri, Hamid Sharif, Michael Hempel, Juan Lopez Jr, Kalyan S Perumalla
Publication Type
Conference Paper
Book Title
ICCWS 2021 - Proceedings of 16th International Conference on Cyber Warfare and Security
Publication Date
Conference Name
ICCWS 2021: Sixteenth International Conference on Cyber Warfare and Security
Conference Location
Cookville, Tennessee, United States of America
Conference Sponsor
Academic Conferences International
Conference Date

Industrial Control Systems device manufacturers frequently add new features to improve their product performance. Oftentimes, these changes are mainly vendor-driven initiatives, and customers may not be aware of the full impact of these new capabilities on their cybersecurity posture. In the energy sector, this can lead to considerable dissonance between vendor-provided cybersecurity claims and a customer’s responsibility for Operation Technology cybersecurity compliance. Thus, the resulting dynamic verification burden is shifted towards the customer and may pose a significant cybersecurity risk to the energy sector landscape. We found that there is very limited research into cybersecurity auditing for Operational Technology. However, a solution is needed for vetting the vendor-supplied feature claims and their adherence to cybersecurity requirements and standards. We are presently engaged in an effort to develop such a system. This paper demonstrates one vital aspect of this effort in proposing an end-to-end framework to accumulate a large repository of ICS device information for this vetting system, curate the dataset, and conduct extensive processing. This framework is designed to use web scraping, data analytics and Natural Language Processing (NLP) techniques to identify vendor websites, automate the collection of website-accessible documents and automatically derive metadata from them for identification of product documents relevant to the repository. We have found that this automated approach to vendor identification, document extraction into a product repository, and NLP pre-processing is unique and has not been previously presented in the literature. The preliminary work shows that this is feasible and can produce reliable results with minimum supervision. Future work will be built upon this foundation in order to achieve semi-supervised vetting of device technical information – a vital capability for ensuring that vendor-claimed device cybersecurity capabilities match industry requirements.