dbgap2x: an R package to explore and extract data from the database of Genotypes and Phenotypes (dbGaP)

Citation:

Versmée G, Versmée L, Dusenne M, Jalali N, Avillach P. dbgap2x: an R package to explore and extract data from the database of Genotypes and Phenotypes (dbGaP). Bioinformatics 2020;36(4):1305-1306.

Date Published:

2020 Feb 15

Abstract:

SUMMARY: Based on the Genomic Data Sharing Policy issued in August 2007, the National Institutes of Health (NIH) has supported several repositories such as the database of Genotypes and Phenotypes (dbGaP). dbGaP is an online repository that provides access to large-scale genetic and phenotypic datasets with more than 1000 studies. However, navigating the website and understanding the relationship between the studies are not easy tasks. Moreover, the decryption of the files is a complex procedure. In this study we propose the dbgap2x R package that covers a broad range of functions for searching dbGaP studies, exploring the characteristics of a study and easily decrypting the files from dbGaP. AVAILABILITY AND IMPLEMENTATION: dbgap2x is an R package with the code available at https://github.com/gversmee/dbgap2x. A containerized version including the package, a Jupyter server and with a Notebook example is available at https://hub.docker.com/r/gversmee/dbgap2x. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.