If you’ve done any digital work—and who isn’t doing digital these days—then you’ve probably heard the term “linked open data” tossed around. “Open data,” as defined by Open Data Institute, is primary data that is freely available for others to use and share that is structured in a standard format that plays well with other file formats and includes metadata (data about the data, check out this Guardian guide for more information) about the data and how it was created. Barnett and Deliyannides (2011) define “open” resources as both a “family of copyright licensing policies under which authors and copyright owners make their works publicly available,” as well as a broader movement within “higher education to increase access to scholarly research and communication.” For graduate students, the “open” movement has provided numerous benefits including free access to articles, resources, software, data, and more—all of which save us money and increase the efficiency of our research.
Linked open data, as defined by Berners-Lee, is a step beyond “open,” and allows for creating connections between resources. It is based on five principles:
- Data should be put online with an open license, such as a Creative Commons license, which can allow access to and use of a variety of resources and data.
- Data should be shared in a format that is machine and human readable. This means formatting data in a manner that can be easily shared and re-used, like using comma separated values instead of listing items in a word document.
- Any resources should be shared in non-proprietary formats. For example, when saving a table in Microsoft Excel, the default option is to save in Excel formats; however, if other individuals don’t have access to Excel, they won’t be able to use the data. By making sure we share in open formats that can be easily used in a variety of programs (e.g. comma separated values), we make our work more accessible. For a good list of file formats, check out this Open Data Handbook reference.
- Each item should have a unique identification when it is put online to ensure that it has a stable identity so that individuals can share the link and resource with others. This means having an individual and stable URL or URI.
- Finally, the resource should include links to other data so users can discover more things. For example, provide links or document identifiers (a specific number assigned to journal articles or other textual works) in the references section so that people can easily access your sources, or provide direct links to maps and other related datasets.
While everyone must weigh the cost-benefit of open data for himself or herself, there are a number of benefits that I believe we can agree on:
- Open access to data provides greater impact for an individual’s work: By sharing your data you are making your work more accessible to others, increasing the likelihood that it will be reused and cited, which in turn opens up potential collaborations.
- If we open raw data we reduce the risk of selection bias in subsequent studies and can work at much larger scales of analysis to answer bigger questions. By making more data available online, we spread the research load, increase reuse of data, and improve our understanding by incorporating more samples into our studies. The more data we are aware of, the better our interpretations become, and we can begin to answer bigger questions, and work at larger scales.
- Access to open data improves efficiency of research within the broader discipline. Many of us spend hours adding data to tables, creating digital maps, configuring our research—by sharing this data we increase efficiency of projects by moving focus beyond production to interpretation of data and decrease the costs of producing data by reducing redundancies.
- Finally, access to raw data and interpretations allows for unanticipated reuse from cross-discipline studies as well as improve intra-disciplinary collaborations.
There are other benefits as well, such as improving student learning by allowing access to primary data, improving preservation of one’s work through sharing and digitization, and improving engagement with the public by increasing transparency.
However, there are some valid concerns and challenges with sharing information and data freely, especially for graduate students. The actual sharing of one’s data can be difficult since it requires creation of clear metadata standards, online server space for dissemination, and resources for making data more accessible and linkable. If graduate students are not readily provided resources or help with digitization and dissemination it can be daunting (especially when we have so many other priorities to address while writing). The most challenging are the sociological concerns, such as the lack of rewards for publication and digitization of data, and more problematic—the fear of data being scooped. The fear of the theft of data is a real concern; especially for graduate students who are trying to make their mark on their discipline. However, if we promote sharing alike and attribution of data, we can lessen this threat, or we can wait to publish data after the dissertation is done.
By linking our data and making it open, we create a more efficient and productive academic environment, which can be extremely beneficial to graduate students. I’m not saying that this is the path for everyone, and there are valid concerns with sharing one’s research especially if you are in the process of doing your dissertation or thesis. But consider the benefits of linking and opening your data—there may be an amazing collaboration or professional benefits out there for you!
Do you think that you’ll make your data accessible online and available after the dissertation? Or is this trend potentially problematic for graduate students?
[Image by Flickr Ryan Hyde used under creative commons licensing.]