This workshop report was written by Arnošt Štanzel for OstBib and made available under a CC-BY 3.0 licence. It has been republished here with minor stilistic adjustments, having otherwise been left unchanged.
The workshop "New Horizons for Research on and in Central and Eastern Europe: The Role of Research Data" was organized by the project team of OstData (funded by the German Research Foundation, DFG), a project to establish a research data service for East, Central, and Southeast European Studies. The workshop's topics comprised amongst others: a) strategies to strengthen the inclusion of research communities in area studies on Eastern, Central, and Southeastern Europe in efforts to build pan-European research data infrastructures, and b) ideas to widen and increase recognition for the publication of research data in the humanities and social sciences (for the full program, see here).
The workshop brought together national and transnational initiatives representing digital research data infrastructures, such as the European Open Science Cloud (EOSC), the European Strategy Forum on Research Infrastructures (ESFRI), or the Nationale Forschungsdateninfrastruktur (NFDI) in Germany, discussing the following questions:
- What insights can be gained from digital efforts in Eastern Europe to build up pan-European data infrastructures?
- How can a new 'digital' West-East divide be prevented from emerging?
- What are the challenges of translating between heterogeneous disciplinary cultures in the creation of comprehensive digital infrastructures in the humanities, in economics, and in social sciences?
a) Best practices on strengthening the inclusion of research communities in Eastern, Central, and Southeastern Europe in pan-European research data infrastructures
Several presenters demonstrated best practices on how to strengthen the inclusion of research communities in Eastern, Central, and Southeastern Europe in efforts to build pan-European research data infrastructures. One example, the European Holocaust Research Infrastructure project (EHRI), unites 26 partners from 13 countries, linking archival resources and archives by providing extensive metadata in order to overcome the fragmentation of Holocaust resources. By working with a structure of regional hubs, e.g. in the Baltic States, Eastern Europe, and Russia, the project recommends a possible way to include research communities in Eastern, Central, and Southeastern Europe into a transnational and interdisciplinary research infrastructure. Another example, The Word Historical Gazetteer (WHG), stressed the advantages of working with focus regions too and is currently waiting for funding to establish an own domain on Russia, Eastern Europe, Eurasia (REEES).
Additionally, EHRI's approach is marked by the experience that the building of a digital infrastructure has to be accompanied by in-person interaction, e.g. by building up a transnational community that actually uses the infrastructure. The example of the Electronic Repository of Russian Historical Statistics (RiStat) underlines this argument as it points out the importance of data-visibility for the relevant scholarly communities. By offering an easily accessible bilingual (English and Russian) platform for statistical data, the RiStat-project develops transnational perspectives without investing heavily into aggregating infrastructures.
The example of the Slovenian Social Science Data Archives (ADP) illustrates how a single infrastructure – in this case a data archive for the social sciences – functions as a catalyst for national activities on research data management. ADP is a member of the Consortium of European Social Science Data Archives (CESSDA) and the Research Data Alliance (RDA), each serving as a national node for Slovenia, connecting national and international research communities, and supporting transnational integration. A goal shared by both ADP and OstData is the aggregation of metadata on research data from one country for re-use in national and international contexts, e.g. making it available to the European Open Science Cloud (EOSC), and in that way enabling the integration of national scholarly communities into European and global infrastructures.
By referencing the Nordic countries, the example of the EOSC illustrates how collaboration can be a way to "punch above one's own weight". Regional cooperation is an opportunity for smaller countries to collaboratively build up digital research infrastructures. Pooling resources can save money and achieve better results compared to national endeavors. In line with this argumentation, regional cooperation in Eastern Europe, e.g. in the Baltic States or the Visegrad States (Czechia, Hungary, Poland and Slovakia) could deliver high end infrastructures competitive with those in Western Europe, thus preventing a new digital West-East divide.
An issue shared by all infrastructures presented is the lack of sustainable funding. How to temporarily work around funding issues may be learned from Eastern European infrastructure projects, as they are often used to working with scarce resources and adapting a pragmatic, step-by step way so as to not over-engineer infrastructures from the start.
b) Increasing scientific recognition for publishing research data
With regards to ideas to widen and enhance scientific recognition for the publication of research data, the discussions during this workshop illustrated that scientific communities are still searching for the best way to approach this issue. Although research funding institutions and political leadership predict that data output will become a harder currency in the near future, the publication of research data does not yet make this work pay, especially not in Eastern Europe. Additionally, the challenges of intellectual property rights and laws regarding the protection of data privacy are setting an environment in which data sharing and data publications emerge only slowly due to legal restrictions.
Subsequently, a discussion evolved around the challenges on how different disciplinary cultures give credit for research results like data. In Germany, for example, a debate is going on in the humanities on how to integrate digital methods and approaches – as an auxiliary science or a discipline in its own right? Thanks to the NFDI application process, awareness regarding research data among historians is rising. The Journal of Digital History (JDH), established due to the fact that "traditional" journals do not publish research data in a way digitally working historians would like them to do, is another example for the ongoing struggle in the humanities on how to include digital approaches. At the same time, the workshop participants argued that the "story" – the narrative quality and basis of history as a scientific subject and discipline – should be pursued at the same pace as innovative tools, code, data, and metadata. And, in order to close the gap between traditionalists and digital humanists, the jargon used in research data circles should be reduced, as it may be difficult to understand for non-humanists.
In this context, a discussion on the definition of data quality emerged. The workshop participants hinted at the value of data depending on successful and frequent re-use. In some ways, different re-use scenarios and/or options can serve as a criterion for data quality. Repositories should open up data to everyone, but the process of accepting and publishing data should be curated and accompanied by peer-review-processes to ensure a high data quality. JDH acknowledged its struggles with the peer review system as an established framework of quality management for books and articles: How to organize peer review processes for research data and data papers, how extensive should a review be – focusing only on the text, the concerned datasets or even the code? The more extensive the review is supposed to be, the more difficult it is to find reviewers who are apt in all aspects. Building a pan-European data culture thus will have to be accompanied by the development of a pan-European data review culture, which needs to emerge from the scientific community itself.
A task for the future: research data infrastructures in a European perspective
The workshop showed examples on how to provide successful research data infrastructures on and in Central and Eastern Europe. However, it also pointed at the huge challenges laying ahead. For example, long-term archiving of research data (including software) is still an open issue. The JDH admitted that guaranteeing code stored in applications like Jupyter notebooks to still run in five years is a task yet to be solved. Several participants underlined that research data created and worked on today will be needed by later generations to understand and reproduce the research practices and methods of our time.
Even more importantly, the participants assured, building digital infrastructures means not only servers and code, but essentially contributions by the involved communities as well. Re-use of data requires the active contribution and collaboration of researchers and good practices of re-using and re-purposing, and not only working long term storage systems. The training of new generations of scholars has to ensure that scholars know how to actively use data. Thus, the humanities and especially historians need to become aware of the innovations traditional methods of history can bring to data science. Transferring source criticism into the digital realm is not to be underrated, and, vice versa, data should be treated as (soon to be historical) sources, with a compound of accompanying methods historical disciplines are specialists for.