NLM Foundation: Delineating autism subtypes by creating a patient centric information commons

NLM Foundation  $100k    PI: Paul Avillach, MD, PhD

Autism spectrum disorders (ASD) are complex and heterogeneous diseases. The number of cases of ASD varies in the different studies from five to 72 cases per 10 000 children. The involvement of genetic factors in ASD is demonstrated. But the complexity of this affection prevented researchers to find clear associations between polymorphisms and autism using standard screening methods like genome-wide association studies (GWAS).

In GWAS, one can study a specific disease by screening all the genome of patients with this disease compared to a control group without this disease.

Specialists suggested new approaches such as a “genotype-first” approach, also known as Phenome-wide association studies (PheWAS). In this approach, the selection criterion is no more a disease but a specific genotype: the variants of an identified gene of interest. Then, systematic associations of diseases with the variants of this gene are assessed. In PheWAS, one can study a specific gene by screening all the diseases of patients with or without a specific allele of a gene. This new approach could allow the discovery of new subtypes of ASD. We demonstrated in a previous work on thiopurine methyl-transferase enzymatic activity in the field of thiopurine therapy that this method could help to describe new subgroups of patients with specific caracteristics. The PheWAS approach might allow the linkage of genes variants to specific sub-groups of ASD. Some of these subgroups could benefit of a specific therapy, given that an earlier treatment can improve the situation.

 This kind of study requires large amount of genetic but also clinical data. Big cohorts of ASD patients and families exist such as Simons Simplex Collection (SSC), Autism Genetic Resource Exchange (AGRE) or Autism Consortium (AC) cohort. They gather genetic and clinical data from thousands of patients, representing a very promising source of data. One of the main issues preventing wide research programs over these cohorts is the heterogeneity of the assessment tools for ASD clinical evaluation. Integrating genetic and clinical data from these cohorts into a single patient centric information platform would allow a more effective use a for research purposes. One of the main challenges preventing an effective use of this knowledge gold mine, is the fragmentation of the data. Indeed, it is very difficult to analyze clinical data fragmented over several different tools, representing thousands of questions and answers that may or may not overlap between the tools. The unification and harmonization of all these clinical data into single concept based ontology might enable its effective use in research.

 Collecting clinical data is expensive and time consuming. Even, the simplest blood analysis requires a nurse or a doctor to collect a sample, and an infrastructure to perform the analysis. Lots of clinical data are collected on a daily basis during health care but are very difficult to access for research purposes. Therefore, specific plateforms like clinical data warehouses (CDW), were designed to enable the secondary use of data collected for health care for research purposes. Boston Children’s Hospital is equipped with such CDW and a significant part of ASD children from SSC are also treated in this hospital. The integration of anonymized data from BCH CDW to SSC clinical data, will increase the number of phenotypic data available for analyses and then enable more accurate analyses.

 Therefore, this project aims to: (i) create a patient centric information commons containing harmonized data from 3 major cohorts of autistic children (SSC, AGRE and AC); (ii) then, link data from SSC to the anonymized health care data from BCH to increase analysis performance; (iii) finally, perform PheWAS analyses using this new plateform to delineate autism subtypes.