COBRA: Publication Discovery and Management System

Project Details

Principal Investigator

Topics:

Motivation

User facilities such as Oak Ridge Leadership Computing Facility (OLCF) are required to report on their publications yearly. In the past, this data was collected by a slow and error prone manual approach. In the manual approach, a person takes the facility user list and compiles potential publications by searching across multiple sources (Web of Science, Google Scholar, Scopus, etc.) and gathers the associated meta-data into a spreadsheet. Because of the author disambiguation problem, one can not assume that every paper matching on a user name actually belongs to said researcher. Therefore each record is then individually looked at in an attempt to verify affiliation. By looking for affiliation in the full text, email addresses, acknowledgments, etc. a publication can usually be verified as belonging to a specific facility. COBRA has been developed in order to replace this manual process with an automated streamlined process.

Publication Discovery

COBRA uses several different methods during the publication discovery phase. First, it searches acknowledgment and funding agency text for related publications. Using the web API’s provided by Web of Science all publications that reference an organization can be pulled back with complete clean meta-data. Universities, laboratories, and any organization that asks for a specific acknowledgment to be added by their users/staff on their publications will have good results using this method.

Secondly, COBRA has the ability to identify publications based on publisher supplied fields such as author affiliations. If an organization is interested in looking at career publications Cobra also has the ability to pull publications based on ORCID and researcherID.

Lastly, COBRA has the ability to fully replicate the previous manual approach of searching for publications based on an author list. COBRA attempts to verify publications based on publisher supplied author affiliation and email addresses when applicable. When it is not possible to verify publications with meta-data alone, COBRA has the ability to download the full text document of the majority of records found. Using the full text, COBRA searches for user supplied keywords such as computer names, facility names, etc. to reduce the list of possible publications to a manageable size. By using terms that are a unique identifier to a specific organization i.e ”Titan Cray Supercomputer”, COBRA will automatically count these publications as related. Using this filtering process COBRA provides a list of potential matches that can be quickly verified manually by looking at the provided in-text matches.

Publication Management

Once the discovery process has been completed, COBRA has many features for publication management. COBRA provides the user with an interactive web based GUI that allows for searching, filtering, and exporting of the data. Because publication information data also comes from multiple sources and sources that currently have no openly available API (self reports, internal systems, Google alerts, etc.); COBRA also allows for publications to be entered into the database manually.

In order to increase efficiency in manual entry COBRA can pull most of the meta-data associated to a publication using it’s DOI. COBRA uses the Web of Science API and Cross-ref API in order to automatically populate many fields such as title, authors, publication date, etc. COBRA also calculates metrics based on the publication meta data such as high impact, citation count, and highly cited publications, while keeping track of were publications are currently in the verification process.

Future Work

Future work of the Cobra system will be related to data analytics and visual analytics. We are currently looking into developing a BI type toolkit that would allow the user to generate many different charts and graphs easily based on facility interest. This will allow the user to explore their publication data in new ways outside of the traditional summary counts given in today’s reports.

Related Publications

R. M. Patton, C. G. Stahl, J. B. Hines, T. E. Potok, and J. C. Wells. Multi-year content analysis of user facility related publications. D-Lib Magazine, 19(9/10), September/October 2013.
R. M. Patton, C. G. Stahl, T. E. Potok, and J. C. Wells. Identification of user facility related publications. D-Lib Magazine, 18(7/8), July/August 2012.