Skip to main content

Clustering-Based Predictive Analytics to Improve Scientific Data Discovery...

by Ranjeet Devarakonda, Jitendra Kumar, Giri Prakash
Publication Type
Conference Paper
Journal Name
IEEE Xplore Digital Library
Book Title
2020 IEEE International Conference on Big Data (Big Data)
Publication Date
Page Numbers
5658 to 5661
Publisher Location
District of Columbia, United States of America
Conference Name
IEEE Big Data Conference
Conference Location
Atlanta, Georgia, United States of America
Conference Sponsor
Conference Date

Given the sheer volume of scientific data archived within the data-intensive projects at the US Department of Energy's Oak Ridge National Laboratory, finding precisely what data we are looking for may not be a trivial task; conversely, we may also miss a more prominent data product. To address such issues, we propose improving the data discovery system and using data analytics methods to comprehend what specific users might be interested in based on their physiological state, search patterns, and past data usage history. This work's primary goal is to prune the complexity, increase the visibility of popular data products, and direct users toward the data that best meet their needs. The proposed algorithm constructs a user profile based on the user's explicit or implicit interactions with the system, such as items they are currently looking at on-site and the key metadata mappings related to the data set. The pattern is then used to build a training data set, which will help find relevant data to recommend to the user.