“Stop the anti-science of not sharing evidence”Name: Kirstie Whitaker
Position: Research Fellow, Alan Turing Institute & Senior Research Associate, University of Cambridge
Institution: University of Cambridge, Department of Psychiatry
More info: Home Page Other
An interview with Kirstie Whitaker on 2 May 2017
Reproducible research, to me, means share the exact steps of everything that you did to come up with the results along with the data. You have to be able to say: “Here is the evidence. A totally independent observer would be able to get the same results as me.”
It’s also really inefficient to have every independent research group around the world collect its own data. That leads to underpowered studies: a combination of false positives – incorrect statistically significant effects that you wouldn’t see if you ran the experiment again in a new sample – and false negatives – a rejection of the null hypothesis even though the effect is real. Data sharing allows researchers to figure out which of those two situations they might be facing.
If you can’t find the same results in another cohort, then we can’t generalise the findings to be meaningfully interpreted. For me that means applying our knowledge at either a clinical level (working with an individual person who has depression or a developmental disorder) or at a policy level (imagining how the government are going to use scientific research to support people in schools, for example, who need to help our young people).
Reproducible research is the bare minimum, but we should aim for generalisable research so that we can take our findings and know that we can see its relevance in groups of different participants. That, to me, is how you take science as a whole forward.
“The current scientific career path doesn’t give much credit to replicating a finding, but knowing whether a result is real is incredibly important.”
The current scientific career path doesn’t give much credit to replicating a finding, but knowing whether a result is real is incredibly important. It also prevents so much wasted effort. I spent most of my PhD finding results that were the opposite direction of the published literature. I still haven’t published most of them. It’s very difficult to go against current dogma in a field.
Sharing data is much harder for me. The studies that I work on are very expensive to run so there are many people involved. I work as part of the Neuroscience in Psychiatry Network (http://www.nspn.org.uk/), which is a Wellcome Trust-funded project between the University of Cambridge and University College London. We study adolescent brain development and have questionnaire data, interview responses and brain scans from many young people. Their answers are personal and sensitive so we have to make sure we maintain our participants’ privacy as much as possible.
I’m also a member of the NSPN data management team and we’re currently working on balancing our responsibilities to our participants and making sure we get the best use from this incredibly valuable data. We want to be able to release the data and share it with the world. That involves building up lots of documentation (metadata) so that people can understand where the data has come from and how we’ve processed it. It also involves working closely with the Clinical School here in Cambridge to make sure that we are not violating participant privacy. We can’t put sensitive data about our participant’s mental health up on the Internet, but we can create a very streamlined and transparent data sharing process to allow researchers to use the data to answer new questions.
“We can’t put sensitive data about our participant’s mental health up on the Internet, but we can create a very streamlined and transparent data sharing process to allow researchers to use the data to answer new questions.”
I’m very proud of a paper that the NSPN consortium published last year on how the hubs of the brain’s structural network strengthen through the teenage years (http://dx.doi.org/10.1073/pnas.1601745113). We shared all of the code, all of the summary data from the brain regions, and replicated all the findings in two independent groups within our cohort. We split the 300 participants who had their MRI scans into a discovery and a validation cohort, and we showed the same results in both of those.
The study findings were reproducible and they replicated in a second cohort, and the data and the code are available for anyone else to use, whether they want to check our ways of working or whether they want to take our method and apply it to a different group.
A better scenario, and that I would advocate for going forward, is to have actually two different lines of consent on the form: one that says ‘I consent to be in the study,’ and another that says ‘I consent to you sharing anonymised versions of my data, making it openly available under this license.’ I have a lot of faith that our participants would want to share their information to push forward scientific understanding. I think that if they were able to give informed consent about what it meant to share their data openly, many of them would say ‘yes’.
One of the things that I’m really excited about with having these sorts of interviews with SPARC Europe is that the more examples and the more stories we have of why Open Data is so important, the easier it is to justify to others.
The data may not be ‘openly available’ but they are FAIR [https://www.nature.com/articles/sdata201618]. They are version-controlled; they’re archived and the metadata can be queried and cited.
In the Neuroscience in Psychiatry Network we have given members of the consortium the priority to work on the data during the period of the grant, but when the grant finishes, we will apply this process to all of our raw data and our processed data. We’ve managed our data with this vision in place from the beginning.
You mentioned your collaborators and that it's quite a large group. Was it difficult to get everyone on board with making the data openly available? Was this new to some people?
Within NSPN [Neuroscience in Psychiatry Network] we’ve tried to support each other in this process. I helped PhD student František Váša, who published this recent paper from NSPN (http://dx.doi.org/10.1093/cercor/bhx249), to make sure that the data was uploaded to the Cambridge repository (https://doi.org/10.17863/CAM.8856), code shared on GitHub (https://github.com/frantisekvasa/structural_network_development) and archived by Zenodo (http://dx.doi.org/10.5281/zenodo.528674), and to answer his questions. Licensing your data is outside of the scope of what we’re trained to do as researchers, and we need to make sure that we’re all sharing information and learning from each other.
There’s also a cultural challenge with Open Data, and again it comes down to the scientific reward system, that novelty and being a ‘special’ researcher or research consortium is particularly well-rewarded. There is a fear that by sharing your data or by sharing your code, others will be able to do what you wanted to do with the data, and therefore they will benefit, and you will not.
“We need more checks and balances to encourage data sharing.”
One of my personal goals for the next few years (although realistically it might be more of a lifetime plan) is to work with funders and publishers to improve the incentive structure in academia. At the moment there are a few data champions who are really pushing to share data, and there are a few funders who are working very hard to ensure that data management plans are in place and followed and that data is made available at the end of a grant. But we need more checks and balances to both encourage data sharing (by rewarding people who share their data in a usable way) and to limit the future grant winning success of researchers who do not share their data for the benefit of others.
Once everybody is on a similar playing field then that fear of having someone else take your data and do the research you wanted to do becomes a lot less worrisome. There’s a huge amount of data available but your individual scientific ideas and the expertise that you bring to understanding that data … that’s still yours! It’s still possible to do great research, the resources will be better utilised and the community as a whole will be faster.
Others will trust and verify your work and, of course, you can use other people’s data. You can pool together multiple studies and really understand how well what you see in one cohort replicates in another cohort, how robust it is to cultural differences or differences in the way that the data is collected, or things like that. You can do analyses that wouldn’t even have been considered possible by one research group alone.
I care about supporting young people who have mental health disorders and my scientific goal is to try to help people to the best of my ability. Any researcher will have a goal of understanding the world a little bit better and to contribute to human knowledge. That’s what I find inspiring about the Open Data movement. I like the idea that the people I will be working with, the people that I will call my peers in five years’ time, will have a much more collaborative and passionate approach to answering these questions than the current competitive side of the climate.
Once we have a research climate that focuses on answering the scientific question rather than playing a game to get the next grant or get the next high impact factor publication, I think you’ll find that there will be many people who had chosen not to pursue careers in science because of the competitive climate, but they will stay instead, and we as a scientific community will benefit from that.
Beyond the misaligned incentive structure, are there any other obstacles that may be preventing researchers from making their data openly available?
Let’s say that one person is a member of four different research studies. You start to reach a possibility that that data could be aggregated across the studies and that you could start to learn something about their mental health, you could learn about their financial situation, you could even start to predict whether they are at risk of diseases or disorders later in life.
The solutions I see for these problems are the two we’ve implemented within NSPN: managed access to data with an agreement that researchers will not try to identify a participant, and making sure participants understand what is involved in Open Data. Interviews like this help them to understand the costs and benefits that are associated with it, and make sure that they are able to give truly informed consent when they sign these forms.
Let me reiterate this though: if there is no person or creature that can be harmed as a result of sharing the data (maybe you’ve calculated the amount of time it takes for a cell in a dish to divide 100 times) that data has to be made available. It’s paid for by the taxpayer in almost all circumstances. It belongs to the general public and it should be shared with them./toggle]
It will also cost a lot less to do this type of research and we will be more efficient. We wouldn’t doom students to repeat studies that have already been completed but weren’t published (a situation known as the file drawer effect). We also won’t have to waste time and money collecting new data. There are certainly lots of new experiments that can be designed and new data collected for them, but there are many, many questions that can already be answered.
“My best case scenario, in my version of a utopian open world, is that people who are good at communicating, who are able to build teams, who are able to bring together experts in different fields and have them work well together, will succeed in an open scientific climate because that will be something that we know to be important.”
Everyone says that interdisciplinary research is what we need, and yet it’s very difficult to publish good interdisciplinary research. I think that’s because the competitive model for scientific progress doesn’t reward the people who build bridges and make connections, who see how different fields are able to work together. Diverse teams of researchers – whether from different academic backgrounds or people with different lived experiences of the world (for example black and minority ethnic, members of the LGBTQ community, people with disabilities) – can do more exciting and world changing research. I want to see all of these aspects of a successful project being rewarded in an academic career.
What project/service/person inspires you and makes you optimistic about Open Science, about the future and where things are headed?
“You need to support people who find the culture to be toxic.”
I truly believe that we can, in my lifetime, get to a place where we are doing science that efficiently benefits everyone. For all of the days that are difficult– talking to senior investigators who are entrenched in the status quo or hearing PhD students being incredibly frustrated because their supervisor won’t allow them to share their data or talk about their results – there the ever growing messages from people who have started version controlling their code and data, and sharing information. It makes me very optimistic about our future.
Young people need to stick around and help drive the cultural change. You don’t need to encourage more people to care about science. You need to support people who find the culture to be toxic.
Copyright: Dr Joyce Heckman. Creative Commons CC-BY Licence.
To more champions >