Lesson 2.3 Introduction on Open Science and its fundamental concepts Open Access and Open Data
What is Open Science
The concepts of public domain, Open and Free licences and Creative Commons licences seen in previous lessons share the philosophical postulate of openness, for which benefits to society are maximised and equally distributed when knowledge can freely circulate. Such a postulate applies in science as well, where enabling open access to materials and results of scientific studies is of paramount importance not only for disseminating scientific studies to the civil society, but also for improving the reliability of scientific discoveries: open access to the scientific resources enables replication and reproducibility of the studies.
In addition, the advent of the internet and digital technologies increases and extends the openness of science in new ways. In fact, scientists nowadays can easily exchange data, comment on studies, share their own publications via the internet, and making use of digital tools and platforms.
This new paradigm is called Open Science: its development has been reinforced by recent calls for the global governance of science from European Institutions which considered the transition towards Open Science a fundamental step to foster knowledge circulation as a driver for faster and wider innovation (see http://ec.europa.eu/programmes/horizon2020/en/h2020-section/open-science-open-access; see https://www.fosteropenscience.eu/content/open-science-scientific-research)
The Open Science movement is at the beginning of its life, and no official definitions are yet widely accepted (a first definition and taxonomy of Open Science is available at https://en.wikipedia.org/wiki/Openscience). The scientific communities, together with the institutions (see for instance the work of the European Commission), have started a dialogue to build a common infrastructure that will allow scientists, companies and citizens to access a shared pool of scientific resources: it is called the Open Science Cloud. However, the development of the Open Science Cloud, which will be the most relevant application of the Open Science paradigm, will still take some years.
According to Peter Suber, Open Access (OA) refers to online research outputs that are free of all restrictions on access (for example, access tolls) and free of many restrictions on use (for example, certain copyright and licence restrictions) (Suber, Peter. ‘Open Access Overview’, 2011).
The European funding programme Horizon 2020 (the most relevant programme for research in Europe) recently provided Guidelines on Open Access to Scientific Publications and Research Data that require that ‘each beneficiary must ensure open access to all peer-reviewed scientific publications relating to its results’. This OA mandate is implemented in two steps, which may be not simultaneous: i) depositing publications in repositories, ii) providing Open Access to them.
Regarding the first step, researchers can refer to the Open Access Infrastructure for Research in Europe (OpenAIRE) to find a suitable repository (allowed archives are institutional, subject-based or centralised repositories); repositories that claim rights over deposited publications and/or preclude access are not valid archiving options. Other useful listings of repositories are the Registry of Open Access Repositories (ROAR), and the Directory of Open Access Repositories (OpenDOAR). A well-known OA repository, for example, is ZENODO.
The second step can be done through opening up the full-text of the item in the chosen repository (‘Green’ Open Access), or publishing the research work in Open Access journals (‘Gold’ Open Access). So-called ‘hybrid’ journals are also a valid option (i.e. journals which, although they use a revenue model based on subscription, also offer the possibility to provide Open Access for individual articles, provided an article processing fee is paid). Where Green Open Access (via a repository), is chosen, beneficiaries must ensure Open Access to the article within at most 6 months in science, technology, engineering and mathematics (STEM), and 12 months for articles in humanities and social sciences (HaSS). The policies of the publishers and of single journals with respect to self-archiving (depositing of articles in repositories), including required embargo periods, are available in the Sherpa RoMEO databases.
Open data is data that can be freely accessed, used, modified and shared by anyone for any purpose – subject only, at most, to requirements to provide attribution and/or share-alike. Compared to proprietary frameworks, open data is characterised – from both a legal and a technical point of view – by lower restrictions applied to its circulation and reuse. This feature is supposed to ultimately foster collaboration, creativity and innovation.
According to the Open Definition, to be open data, the data shall be:
- legally open: that is, available under an open (data) licence that permits anyone freely to access, reuse and redistribute;
- technically open: that is, that the data be available for no more than the cost of reproduction and in machine-readable and bulk form (see http://opendatahandbook.org/glossary/en/terms/open-data/).
The Open Data Handbook outlines three main characteristics for data to be open:
- Availability and Access: the data must be available as a whole and at no more than a reasonable reproduction cost, preferably by downloading over the internet. The data must also be available in a convenient and modifiable form.
- Re-use and Redistribution: the data must be provided under terms that permit re-use and redistribution including the intermixing with other datasets.
- Universal Participation: everyone must be able to use, re-use and redistribute – there should be no discrimination against fields of endeavour or against persons or groups. For example, ‘non-commercial’ restrictions that would prevent ‘commercial’ use, or restrictions of use for certain purposes (for example, only in education), are not allowed.
A key concept for understanding Open Data is ‘interoperability’. Interoperability denotes the ability of diverse systems and organisations to work together (inter-operate). In this case, it is the ability to interoperate – or intermix – different datasets. Interoperability is important because it allows for different components to work together. This ability to componentise and to ‘plug together’ components is essential to building large, complex systems. Without interoperability this becomes near impossible — as evidenced in the most famous myth of the Tower of Babel where the (in)ability to communicate (to interoperate) resulted in the complete breakdown of the tower-building effort.
We face a similar situation with regard to data. The core of a ‘commons’ of data (or code) is that one piece of ‘open’ material contained therein can be freely intermixed with other ‘open’ material. This interoperability is absolutely key to realizing the main practical benefits of ‘openness’: the dramatically enhanced ability to combine different datasets together and thereby to develop more and better products and services (these benefits are discussed in more detail in the section on ‘why’ open data). It is needed to distinguish between legal interoperability, technical interoperability and semantic interoperability (see F. Morando, https://www.jlis.it/article/view/5461).
Providing a clear definition of openness ensures that when you get two open datasets from two different sources, you will be able to combine them together, and it ensures that we avoid our own ‘tower of babel’: lots of datasets but little or no ability to combine them together into the larger systems where the real value lies.
Several public institutions and organisations around the world develop open data portals. Open Data portals facilitate access to and re-use of public sector information. These are web-based interfaces designed to make it easier to find re-usable information. Like library catalogues, they contain metadata records of datasets published for re-use, i.e. mostly relating to information in the form of raw, numerical data and not to textual documents. In combination with specific search functionalities, they facilitate finding datasets of interest. Application Programming Interfaces (APIs) are also often available, offering direct and automated access to data for software applications.
Open Data portals are an important element of most Open Data initiatives. While supporting the policy by offering easy access to data published, they can also work as a catalyst triggering the publication of more and better quality data. For administrations obliged or willing to disseminate their data, they offer the advantage of providing public access without the need to reply to individual requests for access to data. Open Data portals are mainly used by public administrations at European, national and local level, as they publish a large variety of data. But more and more companies are opening up some of their data for developers to re-use.
Notable examples of Open Data portals maintained by public administrations in Europe are:
At all administrative levels, the public sector is one of the major producers and holders of Open Data, which ranges from maps to companies registers for example. During recent years, the amount and variety of Open Data released by public administrations across the world has been tangibly growing: the Open Data Census by the Open Knowledge Foundation gives an overview of the high amount of publicly available data (Open Knowledge International is a global non-profit organisation focused on realising open data’s value to society by helping civil society groups access and use data to take action on social problems.). Moreover, a series of indicators have been selected to measure Open Data maturity across Europe. These indicators cover the level of development of national policies promoting Open Data, an assessment of the features made available on national data portals as well as the expected impact of Open Data.
Several institutions, such as the Open Data Institute (ODI) and the Open Knowledge Foundation (OKF) are working with companies and governments to build an open, trustworthy data ecosystem, where people can make better decisions using data and manage its harmful impacts. They also are promoting educational and training initiatives for citizens (for instance see the School of Data).