Access to raw data from preclinical studies
Peter Kamerman
26 November 2015Rosie Morland (PhD) published an excellent commentary on the Pain Research Forum (PRF) website earlier this year in which she called for open access to raw data from preclinical animal studies.
You can access the full article here:
Access to Raw Behavioral Data From Preclinical Research Papers
Rosemary Morland (September 21, 2015)
There is a revolution happening in scientific publishing, spurred by the ability to include full sets of raw data alongside original research articles. This level of data transparency is changing the way we both view and use publications. Articles are no longer static, but can be updated as new findings become available, and can also act as data sources in their own right, not just in terms of systematic review and meta-analysis, but by providing the complete, raw data sets for subsequent re-analysis by other researchers who may discover novel findings, for instance, by applying different analysis paradigms that may not have been available to the original authors at the time of publication.
This is an important topic, and one in which medical sciences lag far behind other fields. I had not directly encountered the issue until we recently were asked by PLoS One to make the data for a study available, which we did. But making data available is pointless if the supporting documentation is not supplied or it inadequately describes how the data were processed. For instance, the material made available ideally should include: 1) the raw data, 2) the tidy dataset (post-cleaning), 3) the codebook, and 4) the data analysis scripts (which should include how the data was cleaned). The data provided should allow others to reproduce the authors’ analyses, and extend the analyses if appropriate. This is the concept of ‘reproducible research’ (if you are unfamiliar with this concept, rOpenSci and Roger Peng provide good descriptions).
Listing these requirements is much easier than achieving them. Most biomedical researchers receive limited exposure during their training to data wrangling and management, and statistics (usually you learn what stats your supervisor/mentor uses, and you pursue the ‘all important p-value’ using canned, GUI-based statistics packages that are user-friendly, but poor for documenting your analytical approach). This poor data handling awareness contrasts with the meticulous schooling they often receive in experimental design and execution (including record keeping - the ever present lab book).
When asked to provide access to my data by PLoS One I was fortunate in that I was using R and the RStudio IDE for my analysis. But even for the uninitiated, this platform provides an easy to learn literate programming platform for documenting your analyses (from data munging through to your final analyses and making pretty plots) using the ultra-easy to learn markup language, rmarkdown, and Yihui Xie’s awesome knitr package. You just have to push the ‘knit’ button when you are done. And if you are intimidated by R, there is a wealth of high-class, and mostly free, online learning material to get you going (e.g., DataCamp, CodeSchool). Not to mention the ever helpful community at stackoverflow.com; if you are having a problem, someone else probably has experienced a similar problem, and Googling your problem will send you to the answer.
To get your data and documentation online, there are numerous online resources available to host the material. I primarily use GitHub.com, which is used mainly by programmers and data scientists. It’s free, it is designed to facilitate collaboration and sharing, it allows version control, licenses can be assigned, private repositories are available for while you are working on the data, it can generate simple webpages for each project. As examples, follow these links to the original github repository and the simple github-generated webpage for the study I mentioned earlier (please forgive me, only the cleaned data are posted - I was new to this). If you want a DOI for your work, there are many free services that will provide you with one (e.g., Zenodo.com, Figshare.com).
Data drives innovation, and I believe that progress in the field of pain would be greatly enhanced if we were afforded the opportunity to directly interact with others researchers’ data, and combine and directly compare it to our data or data from other sources.