BACKGROUND: Inflammatory processes have been shown to play a role in dementia. To understand this role, we selected two anti-inflammatory drugs (methotrexate and sulfasalazine) to study their association with dementia risk.
METHODS: A retrospective matched case-control study of patients over 50 with rheumatoid arthritis (486 dementia cases and 641 controls) who were identified from electronic health records in the UK, Spain, Denmark and the Netherlands. Conditional logistic regression models were fitted to estimate the risk of dementia.
RESULTS: Prior methotrexate use was associated with a lower risk of dementia (OR 0.71, 95% CI 0.52-0.98). Furthermore, methotrexate use with therapy longer than 4 years had the lowest risk of dementia (odds ratio 0.37, 95% CI 0.17-0.79). Sulfasalazine use was not associated with dementia (odds ratio 0.88, 95% CI 0.57-1.37).
CONCLUSIONS: Further studies are still required to clarify the relationship between prior methotrexate use and duration as well as biological treatments with dementia risk.
Precision medicine promises to revolutionize treatment, shifting therapeutic approaches from the classical one-size-fits-all to those more tailored to the patient's individual genomic profile, lifestyle and environmental exposures. Yet, to advance precision medicine's main objective-ensuring the optimum diagnosis, treatment and prognosis for each individual-investigators need access to large-scale clinical and genomic data repositories. Despite the vast proliferation of these datasets, locating and obtaining access to many remains a challenge. We sought to provide an overview of available patient-level datasets that contain both genotypic data, obtained by next-generation sequencing, and phenotypic data-and to create a dynamic, online catalog for consultation, contribution and revision by the research community. Datasets included in this review conform to six specific inclusion parameters that are: (i) contain data from more than 500 human subjects; (ii) contain both genotypic and phenotypic data from the same subjects; (iii) include whole genome sequencing or whole exome sequencing data; (iv) include at least 100 recorded phenotypic variables per subject; (v) accessible through a website or collaboration with investigators and (vi) make access information available in English. Using these criteria, we identified 30 datasets, reviewed them and provided results in the release version of a catalog, which is publicly available through a dynamic Web application and on GitHub. Users can review as well as contribute new datasets for inclusion (Web: https://avillachlab.shinyapps.io/genophenocatalog/; GitHub: https://github.com/hms-dbmi/GenoPheno-CatalogShiny).
SUMMARY: Based on the Genomic Data Sharing Policy issued in August 2007, the National Institutes of Health (NIH) has supported several repositories such as the database of Genotypes and Phenotypes (dbGaP). dbGaP is an online repository that provides access to large-scale genetic and phenotypic datasets with more than 1000 studies. However, navigating the website and understanding the relationship between the studies are not easy tasks. Moreover, the decryption of the files is a complex procedure. In this study we propose the dbgap2x R package that covers a broad range of functions for searching dbGaP studies, exploring the characteristics of a study and easily decrypting the files from dbGaP.
AVAILABILITY AND IMPLEMENTATION: dbgap2x is an R package with the code available at https://github.com/gversmee/dbgap2x. A containerized version including the package, a Jupyter server and with a Notebook example is available at https://hub.docker.com/r/gversmee/dbgap2x.
SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
PURPOSE: Clinicians and researchers must contextualize a patient's genetic variants against population-based references with detailed phenotyping. We sought to establish globally scalable technology, policy, and procedures for sharing biosamples and associated genomic and phenotypic data on broadly consented cohorts, across sites of care.
METHODS: Three of the nation's leading children's hospitals launched the Genomic Research and Innovation Network (GRIN), with federated information technology infrastructure, harmonized biobanking protocols, and material transfer agreements. Pilot studies in epilepsy and short stature were completed to design and test the collaboration model.
RESULTS: Harmonized, broadly consented institutional review board (IRB) protocols were approved and used for biobank enrollment, creating ever-expanding, compatible biobanks. An open source federated query infrastructure was established over genotype-phenotype databases at the three hospitals. Investigators securely access the GRIN platform for prep to research queries, receiving aggregate counts of patients with particular phenotypes or genotypes in each biobank. With proper approvals, de-identified data is exported to a shared analytic workspace. Investigators at all sites enthusiastically collaborated on the pilot studies, resulting in multiple publications. Investigators have also begun to successfully utilize the infrastructure for grant applications.
CONCLUSIONS: The GRIN collaboration establishes the technology, policy, and procedures for a scalable genomic research network.
BACKGROUND: Relatively little is known about antepartum suicidal behaviour and pregnancy outcomes. We examined associations of antepartum suicidal behaviour, alone and in combination with psychiatric disorders, with adverse infant and obstetric outcomes.
METHODS: We included 188 925 singleton livebirths from a retrospective cohort (1996-2016). Suicidal behaviour, psychiatric disorders, and outcomes were derived from electronic medical records. We performed multivariable logistic regressions with generalised estimating equations to estimate adjusted odds ratios (aOR) with 95% confidence intervals (95%CI).
RESULTS: The prevalence of antepartum suicidal behaviour was 152.44 per 100 000 singleton livebirths. Nearly two-thirds (64.24%) of women with suicidal behaviour also had psychiatric disorders. Compared to women without psychiatric disorders and suicidal behaviour, women with psychiatric disorders alone had 1.3-fold to 1.4-fold increased odds of delivering low birthweight or preterm infants and 1.2-fold increased odds of experiencing obstetric complications. Women with suicidal behaviour alone had increased odds of preterm labour (aOR 2.05, 95% CI 1.16, 3.62). Women with both suicidal behaviour and psychiatric disorders had > twofold increased odds of delivering low birthweight (aOR 2.52, 95% CI 1.40, 4.54), preterm birth (aOR 2.44, 95% CI 1.63, 3.66), and low birthweight/preterm birth (aOR 2.30, 95% CI 1.54, 3.44) infants; the odds of preterm labour (aOR 1.62, 95% CI 1.06, 2.47), placental abruption (aOR 2.33, 95% CI 1.20, 4.51), preterm rupture of membranes (aOR 1.63, 95% CI 1.08, 2.46), and postpartum haemorrhage (aOR 1.93, 95%CI 1.09, 3.40) were elevated.
CONCLUSIONS: Antepartum suicidal behaviour, when co-occurring with psychiatric disorders, is associated with increased odds of adverse infant and obstetric outcomes. Future studies are warranted to understand the causal roles of suicidal behaviour and psychiatric disorders in pregnancy.
Allele-specific analyses to understand frequency differences across populations, particularly populations not well studied, are important to help identify variants that may have a functional effect on disease mechanisms and phenotypic predisposition, facilitating new Genome-Wide Association Studies (GWAS). We aimed to compare the allele frequency of 11 asthma-associated and 16 liver disease-associated single nucleotide polymorphisms (SNPs) between the Estonian, HapMap and 1000 genome project populations. When comparing EGCUT with HapMap populations, the largest difference in allele frequencies was observed with the Maasai population in Kinyawa, Kenya, with 12 SNP variants reporting statistical significance. Similarly, when comparing EGCUT with 1000 genomes project populations, the largest difference in allele frequencies was observed with pooled African populations with 22 SNP variants reporting statistical significance. For 11 asthma-associated and 16 liver disease-associated SNPs, Estonians are genetically similar to other European populations but significantly different from African populations. Understanding differences in genetic architecture between ethnic populations is important to facilitate new GWAS targeted at underserved ethnic groups to enable novel genetic findings to aid the development of new therapies to reduce morbidity and mortality.
The Estonian Biobank, governed by the Institute of Genomics at the University of Tartu (Biobank), has stored genetic material/DNA and continuously collected data since 2002 on a total of 52,274 individuals representing ~5% of the Estonian adult population and is increasing. To explore the utility of data available in the Biobank, we conducted a phenome-wide association study (PheWAS) in two areas of interest to healthcare researchers; asthma and liver disease. We used 11 asthma and 13 liver disease-associated single nucleotide polymorphisms (SNPs), identified from published genome-wide association studies, to test our ability to detect established associations. We confirmed 2 asthma and 5 liver disease associated variants at nominal significance and directionally consistent with published results. We found 2 associations that were opposite to what was published before (rs4374383:AA increases risk of NASH/NAFLD, rs11597086 increases ALT level). Three SNP-diagnosis pairs passed the phenome-wide significance threshold: rs9273349 and E06 (thyroiditis, p = 5.50x10-8); rs9273349 and E10 (type-1 diabetes, p = 2.60x10-7); and rs2281135 and K76 (non-alcoholic liver diseases, including NAFLD, p = 4.10x10-7). We have validated our approach and confirmed the quality of the data for these conditions. Importantly, we demonstrate that the extensive amount of genetic and medical information from the Estonian Biobank can be successfully utilized for scientific research.
Clarke DJB, Wang L, Jones A, Wojciechowicz ML, Torre D, Jagodnik KM, Jenkins SL, McQuilton P, Flamholz Z, Silverstein MC, Schilder BM, Robasky K, Castillo C, Idaszak R, Ahalt SC, Williams J, Schurer S, Cooper DJ, de Miranda Azevedo R, Klenk JA, Haendel MA, Nedzel J, Avillach P, Shimoyama ME, Harris RM, Gamble M, Poten R, Charbonneau AL, Larkin J, Brown TC, Bonazzi VR, Dumontier MJ, Sansone S-A, Ma'ayan A. FAIRshake: Toolkit to Evaluate the FAIRness of Research Digital Resources. Cell Syst 2019;9(5):417-421.Abstract
As more digital resources are produced by the research community, it is becoming increasingly important to harmonize and organize them for synergistic utilization. The findable, accessible, interoperable, and reusable (FAIR) guiding principles have prompted many stakeholders to consider strategies for tackling this challenge. The FAIRshake toolkit was developed to enable the establishment of community-driven FAIR metrics and rubrics paired with manual and automated FAIR assessments. FAIR assessments are visualized as an insignia that can be embedded within digital-resources-hosting websites. Using FAIRshake, a variety of biomedical digital resources were manually and automatically evaluated for their level of FAIRness.
De novo and inherited rare genetic disorders (RGDs) are a major cause of human morbidity, frequently involving neuropsychiatric symptoms. Recent advances in genomic technologies and data sharing have revolutionized the identification and diagnosis of RGDs, presenting an opportunity to elucidate the mechanisms underlying neuropsychiatric disorders by investigating the pathophysiology of high-penetrance genetic risk factors. Here we seek out the best path forward for achieving these goals. We think future research will require consistent approaches across multiple RGDs and developmental stages, involving both the characterization of shared neuropsychiatric dimensions in humans and the identification of neurobiological commonalities in model systems. A coordinated and concerted effort across patients, families, researchers, clinicians and institutions, including rapid and broad sharing of data, is now needed to translate these discoveries into urgently needed therapies.
OBJECTIVE: To estimate the risk of acute myocardial infarction (AMI) or stroke in adults with non-alcoholic fatty liver disease (NAFLD) or non-alcoholic steatohepatitis (NASH).
DESIGN: Matched cohort study.
SETTING: Population based, electronic primary healthcare databases before 31 December 2015 from four European countries: Italy (n=1 542 672), Netherlands (n=2 225 925), Spain (n=5 488 397), and UK (n=12 695 046).
PARTICIPANTS: 120 795 adults with a recorded diagnosis of NAFLD or NASH and no other liver diseases, matched at time of NAFLD diagnosis (index date) by age, sex, practice site, and visit, recorded at six months before or after the date of diagnosis, with up to 100 patients without NAFLD or NASH in the same database.
MAIN OUTCOME MEASURES: Primary outcome was incident fatal or non-fatal AMI and ischaemic or unspecified stroke. Hazard ratios were estimated using Cox models and pooled across databases by random effect meta-analyses.
RESULTS: 120 795 patients with recorded NAFLD or NASH diagnoses were identified with mean follow-up 2.1-5.5 years. After adjustment for age and smoking the pooled hazard ratio for AMI was 1.17 (95% confidence interval 1.05 to 1.30; 1035 events in participants with NAFLD or NASH, 67 823 in matched controls). In a group with more complete data on risk factors (86 098 NAFLD and 4 664 988 matched controls), the hazard ratio for AMI after adjustment for systolic blood pressure, type 2 diabetes, total cholesterol level, statin use, and hypertension was 1.01 (0.91 to 1.12; 747 events in participants with NAFLD or NASH, 37 462 in matched controls). After adjustment for age and smoking status the pooled hazard ratio for stroke was 1.18 (1.11 to 1.24; 2187 events in participants with NAFLD or NASH, 134 001 in matched controls). In the group with more complete data on risk factors, the hazard ratio for stroke was 1.04 (0.99 to 1.09; 1666 events in participants with NAFLD, 83 882 in matched controls) after further adjustment for type 2 diabetes, systolic blood pressure, total cholesterol level, statin use, and hypertension.
CONCLUSIONS: The diagnosis of NAFLD in current routine care of 17.7 million patient appears not to be associated with AMI or stroke risk after adjustment for established cardiovascular risk factors. Cardiovascular risk assessment in adults with a diagnosis of NAFLD is important but should be done in the same way as for the general population.
We developed algorithms to identify pregnant women with suicidal behavior using information extracted from clinical notes by natural language processing (NLP) in electronic medical records. Using both codified data and NLP applied to unstructured clinical notes, we first screened pregnant women in Partners HealthCare for suicidal behavior. Psychiatrists manually reviewed clinical charts to identify relevant features for suicidal behavior and to obtain gold-standard labels. Using the adaptive elastic net, we developed algorithms to classify suicidal behavior. We then validated algorithms in an independent validation dataset. From 275,843 women with codes related to pregnancy or delivery, 9331 women screened positive for suicidal behavior by either codified data (N = 196) or NLP (N = 9,145). Using expert-curated features, our algorithm achieved an area under the curve of 0.83. By setting a positive predictive value comparable to that of diagnostic codes related to suicidal behavior (0.71), we obtained a sensitivity of 0.34, specificity of 0.96, and negative predictive value of 0.83. The algorithm identified 1423 pregnant women with suicidal behavior among 9331 women screened positive. Mining unstructured clinical notes using NLP resulted in a 11-fold increase in the number of pregnant women identified with suicidal behavior, as compared to solely reliance on diagnostic codes.
BACKGROUND: We examined the comparative performance of structured, diagnostic codes vs. natural language processing (NLP) of unstructured text for screening suicidal behavior among pregnant women in electronic medical records (EMRs).
METHODS: Women aged 10-64 years with at least one diagnostic code related to pregnancy or delivery (N = 275,843) from Partners HealthCare were included as our "datamart." Diagnostic codes related to suicidal behavior were applied to the datamart to screen women for suicidal behavior. Among women without any diagnostic codes related to suicidal behavior (n = 273,410), 5880 women were randomly sampled, of whom 1120 had at least one mention of terms related to suicidal behavior in clinical notes. NLP was then used to process clinical notes for the 1120 women. Chart reviews were performed for subsamples of women.
RESULTS: Using diagnostic codes, 196 pregnant women were screened positive for suicidal behavior, among whom 149 (76%) had confirmed suicidal behavior by chart review. Using NLP among those without diagnostic codes, 486 pregnant women were screened positive for suicidal behavior, among whom 146 (30%) had confirmed suicidal behavior by chart review.
CONCLUSIONS: The use of NLP substantially improves the sensitivity of screening suicidal behavior in EMRs. However, the prevalence of confirmed suicidal behavior was lower among women who did not have diagnostic codes for suicidal behavior but screened positive by NLP. NLP should be used together with diagnostic codes for future EMR-based phenotyping studies for suicidal behavior.
BACKGROUND: Non-alcoholic fatty liver disease (NAFLD) is the most common cause of liver disease worldwide. It affects an estimated 20% of the general population, based on cohort studies of varying size and heterogeneous selection. However, the prevalence and incidence of recorded NAFLD diagnoses in unselected real-world health-care records is unknown. We harmonised health records from four major European territories and assessed age- and sex-specific point prevalence and incidence of NAFLD over the past decade.
METHODS: Data were extracted from The Health Improvement Network (UK), Health Search Database (Italy), Information System for Research in Primary Care (Spain) and Integrated Primary Care Information (Netherlands). Each database uses a different coding system. Prevalence and incidence estimates were pooled across databases by random-effects meta-analysis after a log-transformation.
RESULTS: Data were available for 17,669,973 adults, of which 176,114 had a recorded diagnosis of NAFLD. Pooled prevalence trebled from 0.60% in 2007 (95% confidence interval: 0.41-0.79) to 1.85% (0.91-2.79) in 2014. Incidence doubled from 1.32 (0.83-1.82) to 2.35 (1.29-3.40) per 1000 person-years. The FIB-4 non-invasive estimate of liver fibrosis could be calculated in 40.6% of patients, of whom 29.6-35.7% had indeterminate or high-risk scores.
CONCLUSIONS: In the largest primary-care record study of its kind to date, rates of recorded NAFLD are much lower than expected suggesting under-diagnosis and under-recording. Despite this, we have identified rising incidence and prevalence of the diagnosis. Improved recognition of NAFLD may identify people who will benefit from risk factor modification or emerging therapies to prevent progression to cardiometabolic and hepatic complications.
Adverse obstetric and neonatal outcomes among women with psychosis, particularly affective psychosis, has rarely been studied at the population level. We aimed to assess the risk of adverse obstetric and neonatal outcomes among women with psychosis (schizophrenia, affective psychosis, and other psychoses).
INTRODUCTION: The European Medical Information Framework consortium has assembled electronic health record (EHR) databases for dementia research. We calculated dementia prevalence and incidence in 25 million persons from 2004 to 2012.
METHODS: Six EHR databases (three primary care and three secondary care) from five countries were interrogated. Dementia was ascertained by consensus harmonization of clinical/diagnostic codes. Annual period prevalences and incidences by age and gender were calculated and meta-analyzed.
RESULTS: The six databases contained 138,625 dementia cases. Age-specific prevalences were around 30% of published estimates from community samples and incidences were around 50%. Pooled prevalences had increased from 2004 to 2012 in all age groups but pooled incidences only after age 75 years. Associations with age and gender were stable over time.
DISCUSSION: The European Medical Information Framework initiative supports EHR data on unprecedented number of people with dementia. Age-specific prevalences and incidences mirror estimates from community samples in pattern at levels that are lower but increasing over time.
OBJECTIVE: The effects of suicidal behavior on obstetric outcomes remain dangerously unquantified. We sought to report on the risk of adverse obstetric outcomes for US women with suicidal behavior at the time of delivery.
METHODS: We performed a cross-sectional analysis of delivery hospitalizations from 2007-2012 National (Nationwide) Inpatient Sample. From the same hospitalization record, International Classification of Diseases codes were used to identify suicidal behavior and adverse obstetric outcomes. Adjusted odds ratios (aOR) and 95% confidence intervals (CI) were obtained using logistic regression.
RESULTS: Of the 23,507,597 delivery hospitalizations, 2,180 were complicated by suicidal behavior. Women with suicidal behavior were at a heightened risk for outcomes including antepartum hemorrhage (aOR = 2.34; 95% CI: 1.47-3.74), placental abruption (aOR = 2.07; 95% CI: 1.17-3.66), postpartum hemorrhage (aOR = 2.33; 95% CI: 1.61-3.37), premature delivery (aOR = 3.08; 95% CI: 2.43-3.90), stillbirth (aOR = 10.73; 95% CI: 7.41-15.56), poor fetal growth (aOR = 1.70; 95% CI: 1.10-2.62), and fetal anomalies (aOR = 3.72; 95% CI: 2.57-5.40). No significant association was observed for maternal suicidal behavior with cesarean delivery, induction of labor, premature rupture of membranes, excessive fetal growth, and fetal distress. The mean length of stay was longer for women with suicidal behavior.
CONCLUSION: During delivery hospitalization, women with suicidal behavior are at increased risk for many adverse obstetric outcomes, highlighting the importance of screening for and providing appropriate clinical care for women with suicidal behavior during pregnancy.
Increasingly, biobanks are being developed to support organized collections of biological specimens and associated clinical information on broadly consented, diverse patient populations. We describe the implementation of a pediatric biobank, comprised of a fully-informed patient cohort linking specimens to phenotypic data derived from electronic health records (EHR). The Biobank was launched after multiple stakeholders' input and implemented initially in a pilot phase before hospital-wide expansion in 2016. In-person informed consent is obtained from all participants enrolling in the Biobank and provides permission to: (1) access EHR data for research; (2) collect and use residual specimens produced as by-products of routine care; and (3) share de-identified data and specimens outside of the institution. Participants are recruited throughout the hospital, across diverse clinical settings. We have enrolled 4900 patients to date, and 41% of these have an associated blood sample for DNA processing. Current efforts are focused on aligning the Biobank with other ongoing research efforts at our institution and extending our electronic consenting system to support remote enrollment. A number of pediatric-specific challenges and opportunities is reviewed, including the need to re-consent patients when they reach 18 years of age, the ability to enroll family members accompanying patients and alignment with disease-specific research efforts at our institution and other pediatric centers to increase cohort sizes, particularly for rare diseases.
Motivation: In the era of big data and precision medicine, the number of databases containing clinical, environmental, self-reported, and biochemical variables is increasing exponentially. Enabling the experts to focus on their research questions rather than on computational data management, access and analysis is one of the most significant challenges nowadays.
Results: We present Rcupcake, an R package that contains a variety of functions for leveraging different databases through the BD2K PIC-SURE RESTful API and facilitating its query, analysis and interpretation. The package offers a variety of analysis and visualization tools, including the study of the phenotype co-occurrence and prevalence, according to multiple layers of data, such as phenome, exposome or genome.
Availability: The package is implemented in R and is available under Mozilla v2 license from GitHub (https://github.com/hms-dbmi/Rcupcake). Two reproducible case studies are also available (https://github.com/hms-dbmi/Rcupcake-case-studies/blob/master/SSCcaseStu..., https://github.com/hms-dbmi/Rcupcake-case-studies/blob/master/NHANEScase...).
Supplementary information: Supplementary data are available at Bioinformatics online.
The first year of university is a particularly stressful period and can impact academic performance and students' health. The aim of this study was to evaluate the health and lifestyle of undergraduates and assess risk factors associated with psychiatric symptoms.
Between September 2012 and June 2013, we included all undergraduate students who underwent compulsory a medical visit at the university medical service in Nice (France) during which they were screened for potential diseases during a diagnostic interview. Data were collected prospectively in the CALCIUM database (Consultations Assistés par Logiciel pour les Centres Inter-Universitaire de Médecine) and included information about the students' lifestyle (living conditions, dietary behavior, physical activity, use of recreational drugs). The prevalence of psychiatric symptoms related to depression, anxiety and panic attacks was assessed and risk factors for these symptoms were analyzed using logistic regression.
A total of 4,184 undergraduates were included. Prevalence for depression, anxiety and panic attacks were 12.6%, 7.6% and 1.0%, respectively. During the 30 days preceding the evaluation, 0.6% of the students regularly drank alcohol, 6.3% were frequent-to-heavy tobacco smokers, and 10.0% smoked marijuana. Dealing with financial difficulties and having learning disabilities were associated with psychiatric symptoms. Students who were dissatisfied with their living conditions and those with poor dietary behavior were at risk of depression. Being a woman and living alone were associated with anxiety. Students who screened positively for any psychiatric disorder assessed were at a higher risk of having another psychiatric disorder concomitantly.
The prevalence of psychiatric disorders in undergraduate students is low but the rate of students at risk of developing chronic disease is far from being negligible. Understanding predictors for these symptoms may improve students' health by implementing targeted prevention campaigns. Further research in other French universities is necessary to confirm our results.