GO FAIR Data Stewardship Initiative Launched at UCSD. So What Does It Mean For Ag?
GO FAIR is an initiative to promote and support data stewardship that allows data to be Findable, Accessible, Interoperable, and Reusable. I was pleased to attend the launch of the first North American FAIR network last week at the UC San Diego Supercomputing Center.
Coping with a Data Tsunami
To say that we live in a data rich world is an understatement. We live a data drenched world (a fact I'm constantly reminded of by the 'hard drive full' warnings that pop-up on my computer on a weekly basis). Thanks to simultaneous, order-of-magnitude, advances in our ability to produce, disseminate, and store all manner of data, people working in fields from economics to physics to agriculture are struggling to benefit from, rather than be paralyzed by, the volume and diversity of data we produce. And this is by no means a problem only affecting academics, as more and more individuals, private companies and organizations are collecting and working with large volumes of data, from personal health sensors to drones.
Adding to the challenge, there are often major barriers to get data to talk to each other. They may be stored in different formats, use different scales or units of analysis, or be under different restrictions. If you've ever carried personal health data from one doctors office to another by hand, you know what I mean.
FAIR Data Stewardship Principles
These are not new problems, but have taken on increased sense of urgency as the challenge gets worse and the demand for integrated analyses of complex problems grows. GO (Global Open) FAIR is a European based initiative that has two faces: i) a set a principles for data stewardship, and ii) a growing network of institutions and programs that are taking tangible steps toward a world in which data are Findable, Accessible, Interoperable, and Reusable. FAIR certainly doesn't mean that collected data have to be free or open access, but data stewardship should have a way to share information about the existence of data, and a means for access when appropriate.
The FAIR principles mirror what open science advocates have argued for many years. As a program, GO FAIR has gained more traction than many of its predecessors. Following endorsements from the European Commission and other international bodies, the EU has already committed €2 billion to the first phase of implementation. Starting in 2018, the major EU funding agencies will require applicants to submit data stewardship plans that align with the FAIR principles. The initiative is also investing a lot in training people to use metadata standards and tools, many of which already exist.
How is This Relevant for ANR?
ANR academics are impacted by the data psunami in at least two ways (neither for good). Like all practicing scientists, we have to deal with the usual challenges of managing large volumes of data, the frustrations of not being able to find or use data that others have collected, and the burden of all the gymnastics one must do to combine data from different sources into a robust, repeatable analysis. On top of that, as public servants whose work is funded by taxpayers, we have an additional moral and legal responsibility to be good stewards of all data collected for our public mission, which means ensuring the data we collect remains discoverable and accessible for other studies. Similarly, our extension mission also requires us to help California growers and land stewards get the most value from the data they collect, with tools that address their requirements for privacy and security.
While this may all seem like a lot to think about and additional work, the rewards are pretty exciting as the following video shows:
How Close are Your Data to Being FAIR?
For many us, putting the principles of FAIR data stewardship into practice will require a step or two we're not accustomed to, such as i) generating metadata in a format that can be read by both people and machines, and ii) storing our data (and metadata) for the long-term. The table below from a recent Nature article breaks down the gold standard a little further.
F1. (meta)data are assigned a globally unique and persistent identifier
F2. data are described with rich metadata (defined by R1 below)
F3. metadata clearly and explicitly include the identifier of the data it describes
F4. (meta)data are registered or indexed in a searchable resource
I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.
I2. (meta)data use vocabularies that follow FAIR principles
I3. (meta)data include qualified references to other (meta)data
A1. (meta)data are retrievable by their identifier using a standardized communications protocol
A1.1 the protocol is open, free, and universally implementable
A1.2 the protocol allows for an authentication and authorization procedure, where necessary
A2. metadata are accessible, even when the data are no longer available
R1. meta(data) are richly described with a plurality of accurate and relevant attributes
R1.1. (meta)data are released with a clear and accessible data usage license
R1.2. (meta)data are associated with detailed provenance
R1.3. (meta)data meet domain-relevant community standards
Wilkinson, Mark D., et al. "The FAIR Guiding Principles for scientific data management and stewardship." Scientific Data 3 (2016): 160018.
As a research technology unit, I think we're doing fairly well in terms of keeping our data organized and accessible for the long-term. However after looking at our data management practices through the FAIR lens, I now see our metadata misses some important characteristics, a lot of the quality metrics aren't machine readable, and need to learn more about metadata repositories and discoverability, particularly for our drone data. These are challenges common to many new sources of geospatial data, and we look forward to engaging with the new arm of the GO FAIR network to develop solutions.