main menu

Openness needs to be included in our core activities, in everything we do

“Openness needs to be included in our core activities, in everything we do”
Name: Mikko Tolonen
Position: Professor of research on digital resources
Institution: University of Helsinki, Department of Modern Languages
Country: Finland
More info:LinkedIn TwitterOther

ORCID ID: http://orcid.org/0000-0003-2892-8911

An interview with Mikko Tolonen on 6 June 2017

Why are you so keen to share data? What experiences made you realise its importance?

My educational background is in the traditional humanities, and I got my PhD in history. When I started working with digital humanities at the Helsinki Collegium for Advanced Studies, I was looking for a person who is both familiar with data issues and who also would be interested in collaborating with humanities research. This is how I met Leo Lahti, who is one of the first Open Science advocates in Finland. His background is in natural science, and soon I realised that the field of humanities is, in many ways, far behind the field of natural sciences in terms of research data.

“Open Science should not be understood only through publications. Instead, the entire process from the raw data to the final publication is important.”

Historians, for example, are not used to thinking about research data, simply because they think they have no research data. They think they have only research sources, research notes and a publication. But this is not the case. I’ve been trying to promote awareness of research data among my own research group as well as among other researchers. Open Science should not be understood only through publications. Instead, the entire process from the raw data to the final publication is important. Research data is the key that has been better understood in the natural sciences. Humanists, in turn, are facing a comprehensive cultural change as they have just started the work. In the field of humanities we are used to thinking that one person completes the entire research. But in digital humanities, a number of fellow researchers from different scientific backgrounds participate in the research. This requires constant communication, sharing of data, and continuous cooperation with other research teams. Compared to the field of traditional historical research, this is quite a different world.

How are you involved with Open Data?

The work of my own research group is related to intellectual history, and the research data consists largely of textual material. We have a wide-ranging collection of literature, such as the 18th-century British collection, listed in the ESTC (The English Short-Title Catalog). We also use other library databases as research material, e.g. Fennica (The National Bibliography of Finland). Use of the data demands a lot of data cleaning. We have to process the data in the library databases before we are able to combine it with full-text material. The shared research data generated here is important because, as we process the data, we can make mistakes. Hence, you have to be able to trace back what has happened during the process. We would prefer open raw data, but we cannot confine to open sources only. So the raw data may not always be open but the cleaning code always is. In our GitHub repository you can see what has been done. Our work becomes open through the code generated.

In addition to research work, I am involved in the Open Science working group for Open Knowledge Finland (OKF). OKF is successfully promoting Open Science in Finland. I have also had an active role at the University of Helsinki where I am involved in several teams related to Open Science policy-making. I feel that I am part of a common movement for Open Science. In autumn 2016, when the FinELib (The Finnish National Electronic Library) consortium was negotiating with Elsevier, we organised together with OKF a petition called Tiedonhinta.fi, where some 2,000 researchers showed their support for the FinELib negotiators. There is a lot of Open Science discussion, but activating researchers is the most important thing.

What frustrates you most about the current systems? If you could change one thing, what would it be?

There is much talk about Open Science, and opening the research data is the next step. However, collaboration in the field of humanities is still in its early stages. We have to spend too much time on negotiations on the terms to use the data. The fear seems to be that opening data will lead to data spreading in an uncontrolled manner and that data copyright is not respected. It is not sufficiently understood that a researcher needs access to the research data for his/her work.

Thus, availability of data has often been the bottleneck in our work. For example, it took us a lot of effort to get access to the Gale company’s ECCO data (Eighteenth Century Collections Online). They repeatedly answered that our research plan sounds good but they cannot give us the data dump needed. Yet, they have not really developed data analysis tools in the last 15 years. Their revenue model is to sell exclusively licenses to researchers through libraries. Historians may not even have realised that there could be other uses for literature collections. Finally we managed to get ECCO data, but many other researchers still have no access to it. And there is no Open Science without access to the raw data. From the point of view of text and data mining, publishers’ practices are time-consuming. At the same time, I believe that our work with ECCO will also benefit Gale in terms of new tools and applications generated for the use of their sources.

Who or what (project / service) inspires you and makes you optimistic about the future of Open Science?

I would like to mention a scientific community working with the R programming language. The community is able to create small-scale data tools for the needs of different disciplines without the involvement of a commercial operator. It is inspiring to see how such communities grow and spread across disciplines. Our research group started to apply tools created for bioinformatics to analyse library databases based on Leo Lahti’s experience. Afterwards, people from different disciplines with shared interests have been cooperating with us. And there is no need to draft separate contracts for this cooperation. It is pleasing that you don’t always have to start with time-consuming negotiations to get things done.

Another example is our COMHIS Collective research project, which can also be seen as an Open Science project: https://comhis.github.io/. There are researchers in the group that receive COMHIS project funding, but there are also researchers who are not formally tied to the project. However, we all work together; we are discussing project-related matters on Slack every day and taking steps forward on many fronts. This, in my view, represents an Open Science ethos.

What still needs to be done to get more people to share and open up their research data?

Change happens only when Open Science becomes a pragmatic part of the researcher’s education and when it is proven that Open Science is the better way of exercising science. Open Science should not be seen as a separate element of scientific education. Openness needs to be included in the core activity, in everything we do. When teaching the methodology of research, the role of data should be just another aspect of teaching, including data sharing, long-term storage, reproducibility, and the principles of scientific inquiry overall.

“It is perhaps unrealistic to expect that the scholars, who have built their careers in a more closed system, will just change their ways. Therefore, the most crucial thing in all disciplines is to show that the new way is the better way.”

For the new generation of researchers, the Open Data issues will be an integral part of their work. This will change also the field of humanities. It is perhaps unrealistic to expect that the scholars, who have built their careers in a more closed system, will just change their ways. Therefore, the most crucial thing in all disciplines is to show that the new way is the better way. And the new way is based on the best principles of Open Science. Human Genome Project is often used as an example. The field of genomics research developed significantly when laboratories began to compete also with the openness. In this way, competition can be used to benefit everyone.

And finally, what would a world with far more Open Data look like?

Services improving the quality of people’s lives will be evolved in the world of Open Data, and humanists have a role in this development. Nowadays, there are a lot of Open Data competitions, but I feel that the winner is often the one who reinvents the travel route planner app. We can do almost anything with Open Data, but it is only human to reinvent the wheel. This is partly because there is only a limited number of people who are able to make use of Open Data. When the data processing becomes more familiar across disciplines, both perspectives and outcomes of Open Data will change. Engineering-driven smart city planning might turn into something very different when perspectives change, e. g. through the views of historical or cultural heritage, also the historiography and conceptions of history are changing.

Everything considered, the world will become a more democratic place with Open Data. The openness will allow anyone to form a weighted opinion on matters of importance. If there is factual information available, there is, for example, no need to fight populism just by shouting.

Copyright: Creative Commons CC-BY Licence University of Helsinki

Tags: Finland, Open Data, Open Science, TDM, accessibility, advocacy, applications, change, code, collaboration, commercial, community-building, competition, copyright, cultural change, data cleaning, data curation, data processing, democracy, disciplinary differences, duplication, education, errors, fear, frustration, full text, licensing, motivation, policy-making, programming, re-use, services, sharing, teaching, tools, value

To more champions >
Print Friendly, PDF & Email
Comments are closed.