A cacophony of data instruments, formats and practices collide on the field of materials science. And now the recorded tune is one of a “reproducibility crisis,” whereby research fails to be confirmed by the larger scientific community and therefore shrivels into unverified results.
“The key to reproducibility is data management,” said Dr. Robert Hanisch, Director of Data and Informatics at the National Institute of Standards and Technology (NIST).
Hanisch suggests that the so-called reproducibility crisis is not an actual crisis in science; instead, it’s a space where poor data management and esoteric technologies have isolated scientists from the scientific community. But that’s fixable.
Furthering NIST’s mission of promoting scientific advances, Hanisch travels around the world to bring data management to the heart of science and advocate for the use of open data.
The approach comes in several forms. Hanisch’s office publishes public files of open data and updates officially recognized standards and measurements to standardize data in the community. Hanisch also speaks 15 to 20 times a year at events and serves on international research consortiums promoting open data formats, such as XML and JSON, to industry providers of scientific instruments, which often operate in proprietary data.
“We’re trying really, really hard to be good citizens and to enable the industry sector to take advantage of the tools and experience that we are developing here,” Hanisch said.
By offering free public data management tools that are available in GitHub repositories, NIST hopes to encourage industries to invest in allowing their scientific instruments and solutions to export data in open, machine-readable formats. Therefore, with open formats, the larger scientific community can analyze raw findings.
Currently, NIST researchers have to reverse engineer data – making complex calculations to standardize their findings – to account for instrument specifications before the results can be widely analyzed. Interest in the open option is growing among industry experts because expanding from proprietary to open formats would encourage widespread use of their products, Hanisch said.
Of course, copious issues exist with data management for academic and public scientific researchers, too. Hanisch engages with research institutions and academic universities to implement data management “carrots and sticks.”
Carrots can include widespread publicity, tenure and promotion for publishing open, standardized data, while sticks offer punishments and demands to dissuade unsatisfactory data management. NIST has a requirement that scientists engaged in data-producing research provide a data management plan.
“Publish that software as well as the data, and as well as your interpretation of the data, so that it’s fully open for others to scrutinize and to assess whether the processing you’ve done is the best thing. If you do that, then science can advance in a fully transparent way,” Hanisch said.
Hanisch noted that the federal landscape for data management is “uneven,” but improving. He added that some federal agencies were model data stewards.
One such organization is NASA, where Hanisch oversaw the Hubble Space Telescope Data Archive. The space agency has had a standardized data format since the 1970s that the broader astronomy community shares.
But while difficult to mime the success of research across disparate scientific fields, guidance such as the Federal Data Strategy and the OPEN Government Data Act are helping government agencies get on the same page. Leaders in data management, such as at NIST, NASA, the Commerce Department and the Small Business Administration, have helped to orchestrate federal laws and guidelines on data in government.
This blog post is an excerpt from GovLoop’s recent guide “7 Tips to Transform Your Data Into Compelling Stories.” Download the full guide here.
Photo credit: Samuel Sianipar on Unsplash