“As open as possible, as closed as necessary”Name: Dr Marta Teperek
Position: Research Data Facility Manager at Cambridge; now Data Stewardship Co-ordinator at the TU Delft Library
Institution: University of Cambridge, Now Delft University of Technology
More info: LinkedIn Twitter
ORCID ID: http://orcid.org/0000-0001-8520-5598
An interview with Dr Marta Teperek on 19 April 2017
Why are you interested in data sharing, how are you involved with Open Data?
And because of this, there’s a lot of pressure on researchers. In a way, it’s very difficult to blame researchers for this tendency because of the reward structure. What motivates people is to have this nice story, this nice package that can be sold … you know, high-impact factor. The question is, then, are we rewarding researchers who are doing quality research? I don’t think it’s the case; I think we are rewarding a nice shiny paper, not really the quality of the whole process. We are no longer asking questions whether the science behind it is really solid, whether your research has been transparent, whether you have implemented enough controls.
“We are no longer asking questions whether the science behind it is really solid, whether your research has been transparent, whether you have implemented enough controls.”
And what happens with all the things in between? Very often, what we publish in papers is just a beautiful story. That’s how we got from point A to point B, but we are not talking about all of the other things, the negative results, non-positive results that we have experienced in between that have led us to change our hypothesis, for example, which is extremely variable.
So I guess I became interested in data sharing, in a way, because of frustration, because I wouldn’t be able to fix the problem as a researcher. Perhaps I could have more influence if I stepped outside of academia and tried to promote open research and data sharing, which is an element of open research. As a result, that’s how I got a job in the Office of Scholarly Communication at Cambridge, and my role now is to encourage researchers to share their data, to be more transparent, to encourage them to be more open with their research practice. And hopefully, the idea is that, little by little, we’ll be able to change the practice and make sure that researchers are rewarded, not only for the publication and the high-impact factor rating, but also for the quality of the whole research process.
How do you attempt to get researchers to make their data readily available? What are some of the things you do?
In terms of selfish benefits – trying to ‘sell’ the whole story – it’s an attractive thing for them as well. It’s not only about the longer-term benefits that are perhaps less tangible and more difficult to measure. How do you measure community change, for example? That’s a bit of a difficult one. But I’m also trying always to stress the immediate benefits; for example: you will get more citations if you share your data; you will be better known; you will have a public profile. We often try to do simple activities to demonstrate to young researchers, especially how important sharing is. One of the activities we do is to ask researchers to Google themselves, and they usually find some funny Facebook images. And then we try to reflect on that: “If you would have shared your research more publicly in terms of datasets, protocols, blog posts, or if you had a Twitter account talking about your interests, then people would be discovering this other part of you, not only Facebook images.” This shows researchers that their public presence is also affected by the degree of sharing and openness they have in their research career, and is also something valuable. People are really convinced about this. I mention that the selfish benefits involve starting from the greater good to the little problems which are very tangible: you can make a change tomorrow if you share your dataset, for example, and get a DOI; you will be able to put it on your CV. That’s quite important for researchers. So I try to use a mixture of the bigger goals together with the simple things that can make a difference for them.
I like that balance between goals for society and the individual researcher; make it better for everyone and help yourself out at the same time – empowering researchers to make a change.
What makes you optimistic about the future of Open Science and Open Data?
There’s a lot that makes me optimistic and excited to be pursuing this career right now. There have been a lot of movements, both from funding bodies that recognised more and more movements to Open Science. There were some calls from NWO, which is a Dutch funder that recently gave 3 million Euros for replication studies, which I think is quite a notable goal. More and more funders, such as the famous Wellcome Trust in the UK or the NIH in the United States, recognise pre-print publications – publications venues like BioArxiv, for example, as something that is sufficient for researchers to put on their CVs when applying for grants. What I also really like is the emphasis from the funders’ side on sharing other outputs, other data. For example, more and more funders have requirements for data sharing, for sourcebook sharing, therefore understanding the whole reproducibility issue. So I think there are a lot of interesting moves from the funding bodies.
“More and more funders have requirements for data sharing.”
There are, of course, interesting initiatives from organisations such as the European Commission. And I’m sure you’ve heard of the European Open Science Cloud. It recognises the name ‘Open Science’ and how important it is. So there are plenty of movements. There’s a statement about making all articles available as Open Access by 2020, so there is a huge movement by the European Commission to increase openness. The UK is, of course, the leader in terms of Open Access, and in terms of data sharing, so I’m quite confident and happy to see that publishers are also taking a lot of initiatives. Many things have to be changed but, for example, there have been significant good movements from one of the big publishers, Springer Nature, who now have rolled out data-sharing policies across all of their journals. They really want to ensure that what they publish is more and more reproducible.
In terms of things that really make me optimistic: the change you can observe and measure with years. A good illustration of this is in the life sciences. We have fine repositories where people share, but I think that, within that community, it’s normal that people share. You wouldn’t question why you would share; it’s normal. You do think about how you can improve the practice – how you can share better, and how you can make your publications more interactive. So, for example, instead of having a PDF with figures, maybe you could click on the figure and the code will be there and all the data will be there, and you can reproduce the results while reading the paper. It’s all about sharing better.
However, in the life sciences, sharing data is not yet seen as the norm by everyone, but more and more people do it and don’t ask “Why would I share?” They ask: “Did I share enough? Did I share properly? Can I make it better?” That feels completely advanced, and that’s probably because of the community, the effort that people have made and the fact that every single researcher would share and accept that: “If I don’t share, people will think I’m dodgy.”
“Did I share enough? Did I share properly? Can I make it better?”
It becomes the norm. That’s why I’m really hopeful now and I think it’s very important to remind individual researchers that they can make a difference, because it’s not as if somebody told them: “You must do this.” That’s something that the community led themselves. Each one of us, as individuals, has the power to change things slowly. But if you look at 10 years difference – where life sciences is at the moment compared to where they were 10 years ago – it’s incredible. So I’m really quite hopeful when looking at the evidence we currently have that the change is possible.
You already mentioned a bit about how Open Data can benefit research careers. Is there anything else you would like to add to that?
Related to that is another aspect that is quite positive about change: more and more research institutions now include the commitment to sharing and openness as part of the job description. There are even some entire institutes, such as Neuro, which is the neurological institute in Canada, founded to make sure that all their research is made openly available. These kinds of initiatives really make me happy, along with seeing that researchers committed to sharing are rewarded, that they get the jobs, they are promoted, and they can progress with their careers. Of course, that’s not the majority, but the trend is there and, hopefully, with more and more people being concerned about research integrity, it’s going to become a greater concern for research institutions as well.
Can you think of any downsides to Open Data? Could sharing data potentially hold back careers?
“As open as possible, as closed as necessary.”
In terms of the perceived disadvantages of sharing, there is of course a lot of fear of competition, especially in competitive fields. People are sometimes concerned, ‘”Oh, if I share I might be scooped.” But to be honest with you, whenever these arguments happen, we’re always asking the question: “Can you please provide an example of when you shared your data and somebody scooped you?” Actually, so far I’m still lacking the evidence. While talking with thousands of researchers in Cambridge, I have yet to see an example of this really happening.
“Can you please provide an example of when you shared your data and somebody scooped you?”
Perhaps one other counter-example, another benefit of sharing: One of the researchers who was a bit cautious, when I initially mentioned to him that he might want to share his data, is now very keen on doing this, and he’s sharing his pre-print papers as well. What really convinced him is that if he would share his paper and his data set and get a DOI, it becomes sort of a time stamp for him. Because he works in a very competitive field, it then shows evidence that he was actually first. You know how it is when you want to submit your paper to a competitive journal, like Cell or Science; you could wait years for it to be accepted because of the peer review and so on, so sometimes getting it out first and getting it time-stamped, e.g., “I was there,” actually puts you ahead of the competition.
I guess there is also a lot of misconception and fear. Perhaps the most important thing when talking to people is never to neglect their fears or questions, but try to understand why they are afraid. For example, what are the potential risks? And just talk – listening and talking and seeing how we can work together to make sure that we share responsibly at the right moment.
What do you think would happen if public research data were to remain closed, and in contrast, what do you think a world with far more Open Data would look like?
“If you want to move human knowledge forward, you have to share your discoveries with others'”
Currently, with the digital world, saying we can just share the PDF of the paper is certainly no excuse because we can share more. Everything is digital. We all do our results and our data analyses on our laptops, whether it’s just a simple spreadsheet or there are some really big data calculations. Therefore, I think a closed world would bring a lot of potential loss of money and missed investments. I guess that comes back to what would happen if everything were made publicly available. Perhaps we are not sharing enough non-positive results … and this creates a lot of wasted time. An example is a colleague of mine who was doing his PhD. He was spending a lot of time testing hypotheses, and it turned out that another PhD student in the lab 10 years ago did his PhD thesis, which of course is not really accessible, and found that this experiment had been done and doesn’t work. The hypothesis is negative; there is no association between A and B. If these results were published, of course people wouldn’t be wasting their time and wasting public money to repeat the same experiments. This, to me, is the biggest waste of public resources.
On the other hand, I guess some people fear that, if we make absolutely everything publicly available, we wouldn’t be able to sort out the valuable things from the non-valuable things. Would there be a massive flood of information? How would we be able to find what interests us most? This is something we need to recognise, and we need, as a community and as infrastructure providers, to really work to make sure that there is a system for efficient searches and efficient indexing, perhaps allowing people to get scores, not for the impact that they are making, but maybe for the quality perceived by the community. When you look at companies like Airbnb, for example, it’s quite amazing how people can rank certain properties. Indeed, when you search for accommodation on Airbnb – or Booking.com – you can actually say, “I’m just interested in properties that have a certain number of stars.” Currently, we don’t really have any metrics like this when we want to access research content, because we are just using the biased metrics provided to us by the journals that of course want to sell their content.
Perhaps having some independent metrics provided by researchers themselves, assessing the content and saying, “I value this,” or “This was very helpful,” “This was a good quality data set,” or “Really bad description,” or something like that, would be helpful. ‘Likes’ or recommendations, or perhaps some sort of badges rating – that’s something that we really need for the future. I’m talking about a world when everything would be open. Some people might perceive a threat of the sudden flood of information; yes, that is something we have to think about now, and there are some efforts trying to help with the problem already, but we need to be clever. We need to be more like start-up companies in terms of how we present our content to others.
Can you think of obstacles that may be preventing researchers from making their data openly available?
Coming back, let’s say, to a busy researcher’s perspective, they will not have the experience to do so. They are very good at doing their research. They are very good at testing the hypotheses and designing the best methodologies to do that, but they may not be experts on data anonymisation, data storage, or putting together a consent form in a way that is understandable to others. Sometimes there is a lack of support for sharing data amongst researchers. I suppose what we would like to achieve is to make sharing and the infrastructure around it invisible so that it becomes a natural part of the research process where they don’t need to make much extra effort. Help needs to be there whenever its is needed; although it’s not that they have to desperately go looking for it at Cambridge. The lack of sufficient resources and the lack of knowledge, I think, come back to the fact that perhaps there are not sufficient resources available for training, for up-skilling researchers, for providing them with the right tools. It’s quite a bit of a challenge to make sure we address, certainly, having the machine available, for example, the repository that can accept data. But everything that happens before – the right training, the right skills, the right tools readily available to researchers – is something that needs to happen before we can even start talking about sharing. This requires a lot of resources, a lot of money. When you think about proper training, face-to-face, I guess some people prefer that, but it requires a lot of effort.
What frustrates you about current systems in terms of holding data sharing back?
The other problem is that very often we think about the amazing tool that would meet our demands, and we forget about the need for skills, for training. Of course it’s much easier, much more cost-effective to use the new tool which is very fancy, but we really forget about the need for cultural change. And, as I mentioned, you achieve cultural change little by little, by being very patient. I guess this comes back to the policies, perhaps. Frequently, when publishers’ policies are being created, you have very nice responsibilities, let’s say for an institution, for researchers, for publishers and so on – what you would expect these organisations to do in their own world. The problem is that very often these policies are extremely difficult to change in practice, either because of lack of skills or lack of tools researchers could use to adhere to these policies. So you end up having policies just for the sake of having policies. “Oh yes, we are good. We have a policy and we want to be compliant.” That’s not really about this. If we create a policy just for the sake of being compliant, we are really missing the point.
“If we create a policy just for the sake of being compliant, we are really missing the point.”
We should create policies because we know why people have to be advised. Stakeholders have to be consulted when policies are being introduced, and they need to want to have a certain policy. Sometimes I’m a bit worried that we create more and more policies, more and more requirements. If this is not done jointly with the community, then the policies are being perceived as a burden. Policies should be helpful, to encourage good practice. Especially in the UK, there are many policies presently, both from the funders’ side and from the institutional side, which researchers currently perceive as a burden, as a tick-box-striking exercise, and these kinds of policies really miss the point. Policies should come together with resources, with training, with the proper skill sets being given to researchers, which is of course very expensive, so I guess it’s very difficult to convince some organisations. If we really care about changing the culture, we should be investing loads of money into the soft skills – building the expertise.
Do you see funder requirements as impacting Open Data at research institutions?
Institutions have to be sustainable to make sure that they can support their researchers and continue to be excellent at supporting research. But, I guess for this reason, budgets are always tight, so unless funding bodies mandate certain things from an institution, then the institution usually has other priorities than necessarily thinking about the next step in Open Science or Open Data.
One of the benefits of the EPSRC policy was that, for example in our case at the University of Cambridge, we are able to create our data management support services. If the EPSRC didn’t have the mandate that institutions have to support researchers with proper data management and data sharing, I guess we wouldn’t be having this conversation today. Sometimes funder policies, therefore, can be very useful.
So the EPSRC mandate got the University of Cambridge to commit money to support Open Data?
Exactly. And I guess it’s the same for the Open Access policy. Our office is allowed to do all of these amazing things because of the HEFCE policy and funders’ policies. In that way, policies are good, in my opinion, in terms of allowing the institutions to find the right resources to support their researchers. But in an ideal world, you wouldn’t need policies. You would want to do it for the greater good. Unfortunately, the greater good argument works for young researchers but doesn’t necessarily work for senior researchers, but we already had the discussion about what kind of arguments – the stick or the carrot – work for various types of communities.
Is there anything you see that isn't being addressed in terms of data sharing, especially in terms of practicalities or implementation challenges?
What I think is missing at the moment is some sort of network that would recognise these researchers and allow them to connect with their peers more globally – not only within the institution with other peers across different departments, but the problems too; this would differ from discipline to discipline. What would be really good to have, in my opinion, is some kind of community of practice for discipline-specific data sharing. I think in many communities it would be really useful to allow our advocates – people who are really good at doing that – to connect with their peers at other places to get more recognition and more networks, but also an exchange of good practice. This is extremely important for me in my role, to talk with people from other institutions and get good feedback, good ideas, and to share good ideas – what works and what doesn’t work. That really helps us to move much faster. This kind of support for researchers would be extremely useful.
“A community of practice for discipline-specific data sharing.”
The other problem that is not very often discussed is long-term preservation. What I mean here is that very often we have platforms to support preservation of research data, like file format configuration and so on, but science is very often not like archives, like papers, like PDFs that can very easily be preserved and you migrate file formats. More and more researchers are doing their disciplinary research on quite big platforms. For example, in one of our projects at the University of Cambridge, the Open Research pilot, together with the Wellcome Trust, one of their participating research groups is developing a fine platform that mimics the fruit fly brain, on which neurons are being activated. You see it’s interactive: seeing how that neuron talks, how signals move, and then how the fly is able to fly, but this is not the world of Excel spreadsheets or PDFs. So how do you make these outputs available long-term? Funders are not quite happy to commit the resources to long-term preservation, but if funders are not happy, if institutions have tight budgets, who is going to do that? – particularly if our current preservation efforts are mostly focussed on preserving the archives of the past, not really thinking about future problems such as how we preserve what currently researchers are creating. Sometimes we have to think, in terms of preservation, that we are solving the problems of the past and not really investing enough in preserving interactive resources that would be greatly valuable to the community. I guess this would bring up another problem: What’s really valuable? What’s going to be valuable in 20 years time? Maybe nobody will be interested in dresophilia neurons, for example.
So I guess that’s the other problem: how do we identify the bits of science that are worth preserving? I suppose we don’t really have the answer to these questions yet.
Copyright: Dr. Joyce Heckman, University of Cambridge. Creative Commons CC-BY Licence.
To more champions >
Comments are closed.