main menu

Funder requirements definitely change behaviour and they have taken away some of the fear of sharing data

“Funder requirements definitely change behaviour and they have taken away some of the fear of sharing data”
Name: Rosie Higman
Position: Research data advisor, University of Cambridge, now Research Data Librarian at the University of Manchester Library
Institution: University of Cambridge
Country: UK
More info:LinkedIn Twitter


An interview with Rosie Higman on 10 April 2017

Why are you keen to share data and how are you involved with Open Data?

I’m a research data advisor at the research data facility at Cambridge, so it’s my job to help researchers share their data and manage their data effectively. I upload data to our repository, but also spend a lot of time teaching research students. I also coordinate our Data Champions programme, which encourages researchers to advocate for good data management and data sharing within their department.

Why is sharing research data important?

Sharing data is one of those things that we discuss so much that I don’t think about it anymore; it’s become my default. In some of the sciences it’s important for reproducibility, particularly at the moment when there are concerns around fake news, distrust in scientists, and doubts about global problems such as climate change.

“Some of us have the privilege of having the funding to generate data, and this should be shared with others.”

I think it is therefore very important to have the data in the open and being better at communicating the scientific process behind it. There’s also a slight social justice / political element to it in that we have a lot of money in the UK in higher education to produce a lot of data. If we sit on our data then that’s not fair to researchers in other countries who do not have the funding available to create data.

“We have a lot of money in the UK in higher education to produce a lot of data. If we sit on our data then that’s not fair to researchers in other countries who do not have the funding available to create data.”

So particularly, when you look beyond the hard sciences, i.e. at the social sciences, which is my background, I think we should be sharing our social science data so that others can do secondary analyses and do more work with it, as otherwise the field won’t progress. Some of us have the privilege of having the funding to generate data, and this should be shared with others.

What is your background in Open Data?

I researched data sharing as part of a Library Sciences degree, which had a social sciences focus. I think it fits quite well in that I also studied politics and development, and when you’re studying politics, a lot of what I was looking at was ‘how do you make society more equal?’ When you start addressing those issues, a lot of what you’re looking at is economics and the distribution of resources. So in many ways I think it’s not that big of a leap because you’re just talking about how to distribute resources. It’s just that the resource in academia is data.

What still needs to be done to get more researchers to share and make their data open?

Researchers are rather busy people. A lot of appeals to self-interest tend to work, so I think two things need to happen now:

  1. Cultural change, which is a ground-up process where researchers start citing each other’s data and where reusing each other’s data isn’t seen negatively. I think this is critical, and I think that’s going to take longer than the other thing that is happening slowly which is…
  2. Funders’ policies being taken more seriously and publishers starting to demand data as a condition of publishing. Publishers are contributing in a positive way if you look at the developments at Springer Nature over the last year or so, and the work Iain Hrynaszkiewicz has been doing. He has been developing a set of standard [data] policies that journals could implement, and he’s now persuading journal editors to adopt these policies internally. What this should mean is that this issue is being addressed across Springer Nature and other journals. CUP and various publishers are looking at their data policies and considering doing something similar.
    This helps encourage more researchers to share their data, because as much as I would like researchers to value their data as an output in itself; at the moment what counts for researchers’ careers are journal articles in particular journals, and the more those journals push this, the easier it gets. But we’re still going to see quite a ‘tick box’ approach until a cultural change comes about where researchers reuse and cite each other’s data. Increasingly, policies are in place and people are depositing data, but this often lacks quality as the motivation is lacking – for example, depositing an Excel spreadsheet with no description. I have no idea how I’m supposed to reuse this, it is better than nothing, but until it becomes standard practice for people to start looking at each other’s files and checking them and expecting to see how dataset ‘x’ relates to the article, I don’t think we’ll get good quality data sharing.

Funder requirements definitely change behaviour. For example, the EPSRC requires Open Data in the UK. If you look at the pattern of who submits to the repository, our biggest submitters would be from Engineering, Chemistry, and Materials Science and Metallurgy – all of the big EPSRC-funded departments. The ESRC, the key UK social science funder, also requires data deposit, but everything goes into the UK Data Service, which is a good home for it as they can deal with sensitive data. So the main thing is to encourage researchers to share their data, and if they want to share it with us locally or nationally, that’s fine.

To some extent funder mandates have taken away some of the fear of sharing data, but it’s still a mixed bag. Some people think that funder policies haven’t helped because it’s more of a tick box exercise and just another thing the funder is asking them to do, which is not what researchers like. Furthermore, there’s no real money budgeted for it. On the other hand, it’s pushed up Open Data much higher on the agenda so I think that has helped.

Do you think Open Data, as it is now, can benefit a researcher's career?

At the moment, the motivations for sharing data mostly have to do with replication, your data being discoverable leading to more collaborations and possibly additional citations. However, what I think is currently more important is that people are able to reuse your data and work with you on it. It would also benefit people’s careers if the incentive structure at universities was different so that researchers could include a major dataset when going for promotion. However, at the moment, I think a lot of the benefits are soft, such as increased visibility, so I believe we’re going to be stymied until there are some harder benefits for researchers, unfortunately.

Furthermore, the more likely that people who are passionate about this are in senior positions and in a position to hire, the better. You are already seeing that Principal Investigators (PIs) in charge of labs are saying, “My policy is being an open researcher, and if you want to come work at my lab, this is how we do it.” We need more PIs to do the same. At the moment, that’s incredibly unusual.

Currently, we are seeing more engagement with Open Data from early career researchers than we do from PIs. I understand that as if you’re a PI, you have to run the lab; you don’t want to think about yet another thing you’re doing if you can’t see a clear benefit. PIs have a large influence on junior researchers’ behaviour so until there are clearer benefits from Open Data for PIs it’s going to be tricky to get wider engagement.

What other benefits are there to sharing data more openly?

A friend was studying microbiology last year with a hands-off supervisor and it showed me a lot of the day-to-day difficulties caused by closed data. My friend was trying to find protocols, trying to work them out, and he was spending months redoing things. What he was doing wasn’t cutting-edge; I’m sure someone had done this before. If the protocol had been made open, then he could have saved so much time in the lab. What a complete waste of time.

I guess I came to data sharing via Open Access and, having spent some time outside of academia, I saw how frustrating it was to not be able to get access to things behind a paywall. Also, when you get to the end of an article and realise that it was really badly written but that some really interesting research had been carried out, access to that data should be there. Having also been in that situation makes you think that sharing would be a good idea.

Furthermore, opening up data also benefits start-ups, government and society. For example, look at the work that the Open Data Institute does with public datasets and governmental datasets e.g. Open Data Institute Leeds created an app for their local council that told people when their bins were next due to be collected, and sent them a reminder. Opening up data can help see more useful commercial applications, too. I find it quite depressing that we have to put a monetary value on everything, but seeing as that is the policy climate that we’re in at the moment, then perhaps that will help academia to continue to justify funding research data management.

What sort of further obstacles do you see that may be preventing researchers from making their data openly available?

I think time is a really big obstacle. To create a really well-curated dataset that you want to share does take time. I also believe that people are apprehensive about sharing it if their data isn’t super clean – in case it makes them look bad. They then spend a long time cleaning it up, and that adds a barrier. I’m not sure how we help researchers overcome that. Ideally, people make clean data from the start. Things like the Open Science Framework promote doing a lot of research in a much more open environment to begin with which may help with that. I also believe that it comes back to a cultural change within academia of accepting that we’re not perfect and that mistakes will exist in data, and that this is not the end of the world, as they can be corrected. We need a slightly more collegiate spirit of not shouting and saying, “Look, there’s a mistake in your data!” but instead saying, “Let’s improve it and then I can reuse it.”

“We need a slightly more collegiate spirit of not shouting, “Look, there’s a mistake in your data!” but instead saying, “Let’s improve it and then I can reuse it.”

I think this is also really hard as researchers are having this conversation in public, in a climate which isn’t the most friendly, but we need to say, “Actually, everything that’s being done isn’t very good,” or “There are flaws …. Had we been working more openly, someone else could have helped catch it.”

What frustrates you the most about current systems? If you could change something about the current systems, what would it be?

I would ban pretty much any bibliometric measure, particularly at the journal level, if I could. I think the most frustrating thing is talking to researchers who know that data sharing is the right thing and want to do it, but feel that they can’t because of their PI or because they perceive it would damage their career. So we’re 70% of the way there, but that last hurdle stops the researcher from taking that action. I find this really depressing, particularly when we see it amongst PhD students, because it’s going to be so hard to change things if we can’t get them on board. They often grasp the importance but report that their PIs don’t allow them to share or don’t care. The technical work and processes are doable, but making cultural change on the upper levels is much harder.

Is there more you see that isn't being addressed in terms of data sharing, especially in terms of practicality and implementation?

There are a lot of potential things that we could talk about, including making data better quality, more usable, all of that, but I think the big thing that isn’t being done is questioning whether data sharing is being absorbed into the workflow of most labs. I’m thinking particularly in the sciences, rather than in the social sciences or arts and humanities, which are completely different areas.

It’s not something that seems to be part of the day-to-day practice. In some areas I think it is, so if we look at our most regular depositors in Chemistry, I think it must be part of lab processes because they deposit so regularly. However, I believe that this is not the case in many areas. We can tell from the enquiries we get that in a lot of areas it’s not something people think about until they come to publish and exclaim, “Oh no, the publisher says I need to share my data!” And we respond by saying, “Yes, you do, and I wish you’d thought about it two years ago. But you haven’t, so let’s work out what we can do.” So I think it’s becoming part of business as usual that is still some way off.

Do you think there is a downside to sharing data?

I think part of it is a perception that people think they own their data. Academics tend to be quite independent people, and even if their funding is coming from the public, they still see it as their data because they’ve worked really hard to generate it. For some people, that is their perceived downside to sharing data. Particularly working in a very decentralised university, you see that more than you do elsewhere.

We have some challenges here as well in the area of sensitive data, which is still very much working this out, and that will hold things back, and data within the Clinical and Social Sciences is held back for good reason sometimes. I entirely understand that social scientists are sometimes extremely nervous. They work with sensitive populations that are quite vulnerable and they don’t want to share such data. I think we’ve probably got some work to do with people in those disciplines to talk them through their options: “These are the bits you could share. These are the bits you couldn’t. These are the ways you could do it and these are what your options are.”

Another challenge is in the area of longitudinal data where sharing data could potentially help out people’s careers, and I think we’re still working out how best to do that, if you’re collecting data over, say, 20 years. One example saw someone collecting data on tree rings throughout his career, producing a really valuable dataset about what was happening to the climate. He spent 40 years collecting it, and did not want to share it. I understand this in as much that as a researcher you want to get everything from your dataset that you can because you spent a lot of your life collecting it. And particularly if you have a research project that’s going on for 20 years or more, then you don’t want to share very much. However, I think we just need to work out mechanisms so that you are always ahead of others by sitting on your data, for example, for just two years in such a situation and then you are obliged to share. I think when you have a dataset that will evolve over time, you don’t just want to impose a single embargo based on a dataset that then changes. You want to be able to update this dataset and have a kind of rolling embargo almost. That can be difficult, and I think that’s something we’re still working out how best to support that. We also still need to work out the workflows for researchers. If you’re only sharing something every two years, you don’t want it to be a long, convoluted process. It has to be something easy for the researcher to do where data, people and processes related to that data are recorded.

Is there a person / project / service that inspires you and makes you optimistic about the future of Open Science and Open Data?

The Open Science Framework ( excites me because it feels like the potential of everything being out in the open and connected. You can use it to set up projects and you have the potential to choose how much of your data you share. Furthermore, you can connect various cloud services to it, working a bit like an electronic lab notebook, but one that’s totally open. So you can have private projects that are works in progress, but you can also mint Digital Object Identifiers (DOIs) within it, which means the researchers can get credit for what they’re doing. You can get a DOI for your protocol and for parts of your research, so it’s not all based on just one final article that doesn’t include half of the rich research outputs that could be relevant.

What would happen if public research data were to remain closed?

I believe that if research data is closed, then science moves more slowly, and there’s a much more limited group of people who can do things with it. This is what we’ve seen in the past.

Copyright: Dr Joyce Heckman, University of Cambridge. Creative Commons CC-BY Licence.

Tags: DOI, Open Access, Open Data, Open Science, accessibility, benefit, bibliometrics, career, challenge, champions, citation, collaboration, costs, cultural change, data curation, dissemination, ethics, failure, fear, funders, industry, influence, innovation, longitudinal data, mandate, merits, metrics, motivation, ownership, policy, privacy, progress, publishing, quality, re-use, replication, repository, reproducibility, requirement, research assessment, research evaluation, sharing, simplicity, society, time, trust, visibility, workflow

To more champions >
Print Friendly, PDF & Email
Comments are closed.