“By sharing our data, and doing this in an open, public, community fashion, we can determine the best practices for our field”Name: Prof Laura A. Janda
Position: Professor of Russian Linguistics
Institution: UiT The Arctic University of Norway
More info: Home Page Other
ORCID ID: http://orcid.org/0000-0001-5047-1909
An interview with Prof Laura A. Janda on 18 April 2017
Another experience that has pushed me in the direction of Open Data is my work as associate editor of our journal, Cognitive Linguistics. While our journal has always been data-friendly, and there has never been an issue published that didn’t have a statistical analysis of data, around 2008 we crossed the 50% line for the first time. Over 50% of the articles published in our journal involve statistics, and we are probably never going back. While I don’t think we will ever make it to 100%, we are now very much dominated by statistical analyses of data.
Also, I found that it’s a problem as an editor and as a reviewer if you can’t see the data. It’s very important to provide access to the data so that others can see how it was done and learn from it, or even try to replicate it. In this way, we support the scientific method and the integrity of our field overall. It’s also important for transparency, to avoid fraud. We haven’t had any big scandals in linguistics the way that we have seen, for instance, in medicine, but it’s always possible for people to fudge their data a little bit. This is harder to do if the data is all open and public.
Working with TROLLing has even changed my own working habits. When you do a lot of theoretical and statistical studies, it can be hard to locate your own data or even understand how it was put together if it isn’t annotated well enough, especially once you’ve moved on to something new. Today, of course, I know exactly what all those fields mean, but will I know in a month, or in a year, or in 10 years? One nice thing about having a resource like TROLLing is that it really forces me to upload all my data in a place where I can find it again, and I can direct others to find it. Also, if I have gone through the exercise of annotating the data in a way that I hope makes it clear even to somebody who doesn’t know me and has no previous knowledge of my data, then, hopefully, it will be clear enough for me when I revisit the data later. Nowadays, it’s easier to go back to TROLLing to find my own data and code – and I know it’s always there, and it’s safe – rather than having to dig around in my own files.
I use my open data in teaching, too. There is a textbook that I use in my course, with some datasets and analyses for people to go through. But I have my own data, and there is something different about using your own data, because you know it so well. I give my students a dataset for each type of statistical analysis they are to learn. I give them my own dataset and my own code, and then we work through it. I can answer all their questions and really give them a full experience of what it’s like to work with data and code. It’s not like you can just collect data and shovel them over to some statistician; say the word “verb,” for example, and the shutters go down and he or she may not understand the linguistic terms. You have to analyse the data yourself, because the statistician will never understand it the way you do. Also, you have to have some idea of what the models are that you are going to use in the end, in order to collect the data that will be amenable to that kind of modelling in that kind of analysis.
One of my colleagues said, when we were making the instructional videos: “Laura, you have to make these instructional videos such that even your grandmother could upload data onto TROLLing.” I think we came pretty close to that. I think it’s fairly self-explanatory with the instructional videos, and I have always felt that research and teaching go hand in hand. I have never been involved in a research project that didn’t have some sort of teaching angle to it. Conversely, whenever I am teaching, I always try to think about what we still need to learn. That is one of the great things about teaching: you see the students, you can see the gears turning in their heads, and you can see that they see things from a different perspective. I learn from them constantly, and that again feeds back into the teaching and research. It’s a continuous cycle.
The students, therefore, are getting a simulated experience of hands-on working with the data. They get the data, they get the code, we go through it, we all sit there together, they all have their computers open; it’s a hands-on experience of working directly with the data.
I want to mention the dissertation by Jaap Kamphuis that was defended in Leiden. I had met the author at a couple of conferences, and we were familiar with each other’s work. I was asked to be an examiner at his dissertation defence. I received a copy and was reading through it when I realised that he had taken the method that we had used, and he had gotten it from TROLLing, from our open data site. He had applied the method to different data, and used it in a different way; it was so exciting that I practically cried! This wouldn’t have happened if it weren’t for TROLLing. He might have read my article, but would have then had to call me to find out what methodology I’d used. Instead, he was able to go to TROLLing, download it, and see how it was done, and he said: “Yeah, I can do the same.”
Copyright: Creative Commons CC-BY Licence University Library, UiT The Arctic University of Norway
To more champions >