We are fortunate to be living in an era of twin biomedical data surges: a burgeoning representation of human phenotypes in the medical records of our healthcare systems, and high-throughput sequencing making rapid technological advances. The difficulty representing genomic data and its annotations has almost by itself led to the recognition of a biomedical “Big Data” challenge, and the complexity of healthcare data only compounds the problem to the point that coherent representation of both systems on the same platform seems insuperably difficult. We investigated the capability for complex, integrative genomic and clinical queries to be supported in the Informatics for Integrating Biology and the Bedside (i2b2) translational software package. Three different data integration approaches were developed: The first is based on Sequence Ontology, the second is based on the tranSMART engine, and the third on CouchDB. These novel methods for representing and querying complex genomic and clinical data on the i2b2 platform are available today for advancing precision medicine.
When developed jointly with clinical information systems, clinical data warehouses (CDWs) facilitate the reuse of healthcare data and leverage clinical research.
To describe both data access and use for clinical research, epidemiology and health serviceresearch of the “Hôpital Européen Georges Pompidou” (HEGP) CDW.
The CDW has been developed since 2008 using an i2b2 platform. It was made available to health professionals and researchers in October 2010. Procedures to access data have been implemented and different access levels have been distinguished according to the nature of queries.
As of July 2016, the CDW contained the consolidated data of over 860,000 patients followed since the opening of the HEGP hospital in July 2000. These data correspond to more than 122 million clinical item values, 124 million biological item values, and 3.7 million free text reports. The ethics committee of the hospital evaluates all CDW projects that generate secondary data marts. Characteristics of the 74 research projects validated between January 2011 and December 2015 are described.
The use of HEGP CDWs is a key facilitator for clinical research studies. It required however important methodological and organizational support efforts from a biomedical informatics department.
Assessment of drug and vaccine effects by combining information from different healthcare databases in the European Union requires extensive efforts in the harmonization of codes as different vocabularies are being used across countries. In this paper, we present a web application called CodeMapper, which assists in the mapping of case definitions to codes from different vocabularies, while keeping a transparent record of the complete mapping process.
CodeMapper builds upon coding vocabularies contained in the Metathesaurus of the Unified Medical Language System. The mapping approach consists of three phases. First, medical concepts are automatically identified in a free-text case definition. Second, the user revises the set of medical concepts by adding or removing concepts, or expanding them to related concepts that are more general or more specific. Finally, the selected concepts are projected to codes from the targeted coding vocabularies. We evaluated the application by comparing codes that were automatically generated from case definitions by applying CodeMapper's concept identification and successive concept expansion, with reference codes that were manually created in a previous epidemiological study.
Automated concept identification alone had a sensitivity of 0.246 and positive predictive value (PPV) of 0.420 for reproducing the reference codes. Three successive steps of concept expansion increased sensitivity to 0.953 and PPV to 0.616.
Automatic concept identification in the case definition alone was insufficient to reproduce the reference codes, but CodeMapper's operations for concept expansion provide an effective, efficient, and transparent way for reproducing the reference codes.
The heterogeneity of patient phenotype data are an impediment to the research into the origins and progression of neuropsychiatric disorders. This difficulty is compounded in the case of rare disorders such as Phelan-McDermid Syndrome (PMS) by the paucity of patient clinical data. PMS is a rare syndromic genetic cause of autism and intellectual deficiency. In this paper, we describe the Phelan-McDermid Syndrome Data Network (PMS_DN), a platform that facilitates research into phenotype–genotype correlation and progression of PMS by: a) integrating knowledge of patient phenotypes extracted from Patient Reported Outcomes (PRO) data and clinical notes—two heterogeneous, underutilized sources of knowledge about patient phenotypes—with curated genetic information from the same patient cohort and b) making this integrated knowledge, along with a suite of statistical tools, available free of charge to authorized investigators on a Web portal https://pmsdn.hms.harvard.edu. PMS_DN is a Patient Centric Outcomes Research Initiative (PCORI) where patients and their families are involved in all aspects of the management of patient data in driving research into PMS. To foster collaborative research, PMS_DN also makes patient aggregates from this knowledge available to authorized investigators using distributed research networks such as the PCORnet PopMedNet. PMS_DN is hosted on a scalable cloud based environment and complies with all patient data privacy regulations. As of October 31, 2016, PMS_DN integrates high-quality knowledge extracted from the clinical notes of 112 patients and curated genetic reports of 176 patients with preprocessed PRO data from 415 patients.
The National Health and Nutrition Examination Survey (NHANES) is a population survey implemented by the Centers for Disease Control and Prevention (CDC) to monitor the health of the United States whose data is publicly available in hundreds of files. This Data Descriptor describes a single unified and universally accessible data file, merging across 255 separate files and stitching data across 4 surveys, encompassing 41,474 individuals and 1,191 variables. The variables consist of phenotype and environmental exposure information on each individual, specifically (1) demographic information, physical exam results (e.g., height, body mass index), laboratory results (e.g., cholesterol, glucose, and environmental exposures), and (4) questionnaire items. Second, the data descriptor describes a dictionary to enable analysts find variables by category and human-readable description. The datasets are available on DataDryad and a hands-on analytics tutorial is available on GitHub. Through a new big data platform, BD2K Patient Centered Information Commons (http://pic-sure.org), we provide a new way to browse the dataset via a web browser (https://nhanes.hms.harvard.edu) and provide application programming interface for programmatic access.
The recent announcement of the Precision Medicine Initiative by President Obama has brought precision medicine (PM) to the forefront for healthcare providers, researchers, regulators, innovators, and funders alike. As technologies continue to evolve and datasets grow in magnitude, a strong computational infrastructure will be essential to realize PM's vision of improved healthcare derived from personal data. In addition, informatics research and innovation affords a tremendous opportunity to drive the science underlying PM. The informatics community must lead the development of technologies and methodologies that will increase the discovery and application of biomedical knowledge through close collaboration between researchers, clinicians, and patients. This perspective highlights seven key areas that are in need of further informatics research and innovation to support the realization of PM.
INTRODUCTION: We see increased use of existing observational data in order to achieve fast and transparent production of empirical evidence in health care research. Multiple databases are often used to increase power, to assess rare exposures or outcomes, or to study diverse populations. For privacy and sociological reasons, original data on individual subjects can't be shared, requiring a distributed network approach where data processing is performed prior to data sharing.
CASE DESCRIPTIONS AND VARIATION AMONG SITES: We created a conceptual framework distinguishing three steps in local data processing: (1) data reorganization into a data structure common across the network; (2) derivation of study variables not present in original data; and (3) application of study design to transform longitudinal data into aggregated data sets for statistical analysis. We applied this framework to four case studies to identify similarities and differences in the United States and Europe: Exploring and Understanding Adverse Drug Reactions by Integrative Mining of Clinical Records and Biomedical Knowledge (EU-ADR), Observational Medical Outcomes Partnership (OMOP), the Food and Drug Administration's (FDA's) Mini-Sentinel, and the Italian network-the Integration of Content Management Information on the Territory of Patients with Complex Diseases or with Chronic Conditions (MATRICE).
FINDINGS: National networks (OMOP, Mini-Sentinel, MATRICE) all adopted shared procedures for local data reorganization. The multinational EU-ADR network needed locally defined procedures to reorganize its heterogeneous data into a common structure. Derivation of new data elements was centrally defined in all networks but the procedure was not shared in EU-ADR. Application of study design was a common and shared procedure in all the case studies. Computer procedures were embodied in different programming languages, including SAS, R, SQL, Java, and C++.
CONCLUSION: Using our conceptual framework we found several areas that would benefit from research to identify optimal standards for production of empirical knowledge from existing databases.an opportunity to advance evidence-based care management. In addition, formalized CM outcomes assessment methodologies will enable us to compare CM effectiveness across health delivery settings.
Due to the heterogeneity of existing European sources of observational healthcare data, data source-tailored choices are needed to execute multi-data source, multi-national epidemiological studies. This makes transparent documentation paramount. In this proof-of-concept study, a novel standard data derivation procedure was tested in a set of heterogeneous data sources. Identification of subjects with type 2 diabetes (T2DM) was the test case. We included three primary care data sources (PCDs), three record linkage of administrative and/or registry data sources (RLDs), one hospital and one biobank. Overall, data from 12 million subjects from six European countries were extracted. Based on a shared event definition, sixteeen standard algorithms (components) useful to identify T2DM cases were generated through a top-down/bottom-up iterative approach. Each component was based on one single data domain among diagnoses, drugs, diagnostic test utilization and laboratory results. Diagnoses-based components were subclassified considering the healthcare setting (primary, secondary, inpatient care). The Unified Medical Language System was used for semantic harmonization within data domains. Individual components were extracted and proportion of population identified was compared across data sources. Drug-based components performed similarly in RLDs and PCDs, unlike diagnoses-based components. Using components as building blocks, logical combinations with AND, OR, AND NOT were tested and local experts recommended their preferred data source-tailored combination. The population identified per data sources by resulting algorithms varied from 3.5% to 15.7%, however, age-specific results were fairly comparable. The impact of individual components was assessed: diagnoses-based components identified the majority of cases in PCDs (93-100%), while drug-based components were the main contributors in RLDs (81-100%). The proposed data derivation procedure allowed the generation of data source-tailored case-finding algorithms in a standardized fashion, facilitated transparent documentation of the process and benchmarking of data sources, and provided bases for interpretation of possible inter-data source inconsistency of findings in future studies.
OBJECTIVE: The purpose of this study is to determine whether the posterior radioscaphoid angle, a marker of posterior displacement of the scaphoid, is associated with degenerative joint disease in patients with scapholunate ligament tears.
MATERIALS AND METHODS: Images from 150 patients with wrist pain who underwent CT arthrography and radiography were retrospectively evaluated. Patients with and without scapholunate ligament ruptures were divided into two groups according to CT arthrography findings. The presence of degenerative changes (scapholunate advanced collapse [SLAC] wrist) was evaluated and graded on conventional radiographs. Images were evaluated by two readers independently, and an adjudicator analyzed the discordant cases. Posterior radioscaphoid angle values were correlated with CT arthrography and radiographic findings. The association between posterior radioscaphoid angle and degenerative joint disease was evaluated. Scapholunate and radiolunate angles were considered in the analysis.
RESULTS: The posterior radioscaphoid angle was measurable in all patients, with substantial interobserver agreement (intraclass correlation coefficient, 0.75). The posterior radioscaphoid angle performed better than did the scapholunate and radiolunate angles in the differentiation of patients with and without SLAC wrist (p < 0.02). Posterior radioscaphoid angles greater than 114° presented an 80.0% sensitivity and 89.7% specificity for the detection of SLAC wrist.
CONCLUSION: Posterior radioscaphoid angles were strongly associated with degenerative wrist disease, with potential prognostic implications in patients with wrist trauma and scapholunate ligament ruptures.
The worldwide incidence of melanoma is rising faster than any other cancer, and prognosis for patients with metastatic disease is poor. Current targeted therapies are limited in their durability and/or effect size in certain patient populations due to acquired mechanisms of resistance. Thus, the development of synergistic combinatorial treatment regimens holds great promise to improve patient outcomes. We have previously shown that a model for in-silico knowledge discovery, Translational Ontology-anchored Knowledge Discovery Engine (TOKEn), is able to generate valid relationships between bimolecular and clinical phenotypes. In this study, we have aggregated observational and canonical knowledge consisting of melanoma-related biomolecular entities and targeted therapeutics in a computationally tractable model. We demonstrate here that the explicit linkage of therapeutic modalities with biomolecular underpinnings of melanoma utilizing the TOKEn pipeline yield a set of informed relationships that have the potential to generate combination therapy strategies.
OBJECTIVE: To evaluate the impact of computerized provider order entry (CPOE) at the bedside on medical students training.
MATERIALS AND METHODS: We conducted a randomized cross-controlled educational trial on medical students during two clerkship rotations in three departments, assessing the impact of the use of CPOE on their ability to place adequate monitoring and therapeutic orders using a written test before and after each rotation. Students' satisfaction with their practice and the order placement system was surveyed. A multivariate mixed model was used to take individual students and chief resident (CR) effects into account. Factorial analysis was applied on the satisfaction questionnaire to identify dimensions, and scores were compared on these dimensions.
RESULTS: Thirty-six students show no better progress (beginning and final test means = 69.87 and 80.98 points out of 176 for the control group, 64.60 and 78.11 for the CPOE group, p = 0.556) during their rotation in either group, even after adjusting for each student and CR, but show a better satisfaction with patient care and greater involvement in the medical team in the CPOE group (p = 0.035*). Both groups have a favorable opinion regarding CPOE as an educational tool, especially because of the order reviewing by the supervisor.
CONCLUSION: This is the first randomized controlled trial assessing the performance of CPOE in both the progress in prescriptions ability and satisfaction of the students. The absence of effect on the medical skills must be weighted by the small time scale and low sample size. However, students are more satisfied when using CPOE rather than usual training.
Graft-versus-host disease (GVHD) is a known risk factor for invasive aspergillosis (IA), but remains poorly studied in relation to Clostridium difficile infection (CDI). We report a case of a 58-years-old patient who developed an IA within a protected room, CDI and GVHD after allogeneic allogeneic peripheral blood stem cell transplantation (PBSCT). Factors associated with this complex condition in patients receiving allogeneic PBSCT need to be identified.
This work proposes an integrated workflow for secondary use of medical data to serve feasibility studies, and the prescreening and monitoring of research studies. All research issues are initially addressed by the Clinical Research Office through a research portal and subsequently redirected to relevant experts in the determined field of concentration. For secondary use of data, the workflow is then based on the clinical data warehouse of the hospital. A datamart with potentially eligible research candidates is constructed. Datamarts can either produce aggregated data, de-identified data, or identified data, according to the kind of study being treated. In conclusion, integrating the secondary use of data process into a general research workflow allows visibility of information technologies and improves the accessability of clinical data.
In this paper, we present a semantic, metadata based knowledge discovery methodology for identifying teams of researchers from diverse backgrounds who can collaborate on interdisciplinary research projects: projects in areas that have been identified as high-impact areas at The Ohio State University. This methodology involves the semantic annotation of keywords and the postulation of semantic metrics to improve the efficiency of the path exploration algorithm as well as to rank the results. Results indicate that our methodology can discover groups of experts from diverse areas who can collaborate on translational research projects.
OBJECTIVE: Matching healthcare staff resources to patient needs in the ICU is a key factor for quality of care. We aimed to assess the impact of the staffing-to-patient ratio and workload on ICU mortality.
DESIGN: We performed a multicenter longitudinal study using routinely collected hospital data.
SETTING: Information pertaining to every patient in eight ICUs from four university hospitals from January to December 2013 was analyzed.
PATIENTS: A total of 5,718 inpatient stays were included.
MEASUREMENTS AND MAIN RESULTS: We used a shift-by-shift varying measure of the patient-to-caregiver ratio in combination with workload to establish their relationships with ICU mortality over time, excluding patients with decision to forego life-sustaining therapy. Using a multilevel Poisson regression, we quantified ICU mortality-relative risk, adjusted for patient turnover, severity, and staffing levels. The risk of death was increased by 3.5 (95% CI, 1.3-9.1) when the patient-to-nurse ratio was greater than 2.5, and it was increased by 2.0 (95% CI, 1.3-3.2) when the patient-to-physician ratio exceeded 14. The highest ratios occurred more frequently during the weekend for nurse staffing and during the night for physicians (p < 0.001). High patient turnover (adjusted relative risk, 5.6 [2.0-15.0]) and the volume of life-sustaining procedures performed by staff (adjusted relative risk, 5.9 [4.3-7.9]) were also associated with increased mortality.
CONCLUSIONS: This study proposes evidence-based thresholds for patient-to-caregiver ratios, above which patient safety may be endangered in the ICU. Real-time monitoring of staffing levels and workload is feasible for adjusting caregivers' resources to patients' needs.
BACKGROUND: Children with inflammatory bowel disease are at risk of vaccine-preventable diseases mostly due to immunosuppressive drugs.
AIM: To evaluate coverage after an awareness campaign informing patients, their parents and general practitioner about the vaccination schedule.
METHODS: Vaccination coverage was firstly evaluated and followed by an awareness campaign on the risk of infection via postal mail. The trial is a case-control study on the same patients before and after the awareness campaign. Overall, 92 children were included. A questionnaire was then completed during a routine appointment to collect data including age at diagnosis, age at data collection, treatment history, and vaccination status.
RESULTS: Vaccination rates significantly increased for vaccines against diphtheria-tetanus-poliomyelitis (92% vs. 100%), Haemophilus influenzae (88% vs. 98%), hepatitis B (52% vs. 71%), pneumococcus (36% vs. 57%), and meningococcus C (17% vs. 41%) (p<0.05). Children who were older at diagnosis were 1.26 times more likely to be up-to-date with a minimum vaccination schedule (diphtheria-tetanus-poliomyelitis, pertussis, H. influenzae, measles-mumps-rubella, tuberculosis) (p=0.002).
CONCLUSION: Informing inflammatory bowel disease patients, their parents and general practitioner about the vaccination schedule via postal mail is easy, inexpensive, reproducible, and increases vaccination coverage. This method reinforces information on the risk of infection during routine visits.
OBJECTIVES: In France, medical students regularly complain about the shortcomings of their theoretical training and the necessity of its adaptation to better fit the needs of students. The goal was to evaluate the theoretical teaching practices in postgraduate medical studies by: 1) collecting data from medical students in different medical faculties in France; 2) comparing this data with expected practices when it is possible; 3) and proposing several lines of improvement.
METHODS: A survey of theoretical practices in the 3rd cycle of medical studies was conducted by self-administered questionnaires which were free of charge, anonymous, and administered electronically from July 3 to October 31, 2013 to all medical students in France.
RESULTS: National, inter-regional, regional and field internship educational content was absent in respectively 50.5%, 42.8%, 26.0% and 30.2% of cases. Medical students follow complementary training due to insufficient DES and/or DESC 2 training in 43.7% of cases or as part of a professional project in 54.9% of cases. The knowledge sought by medical students concerns the following crosscutting topics: career development (58.9%), practice management (50.7%), medical English (50.4%) and their specialty organization (49.9%). Fifty-four point one percent would like to be evaluated on their theoretical training on an annual basis.
CONCLUSION: The results of this first national survey give insights into the theoretical teaching conditions in postgraduate medical education in France and the aspirations of medical students.
While risk of acute kidney injury (AKI) is a well documented adverse effect of some drugs, few studies have assessed the relationship between drug-drug interactions (DDIs) and AKI. Our objective was to develop an algorithm capable of detecting potential signals on this relationship by retrospectively mining data from electronic health records.
MATERIAL AND METHODS:
Data were extracted from the clinical data warehouse (CDW) of the Hôpital Européen Georges Pompidou (HEGP). AKI was defined as the first level of the RIFLE criteria, that is, an increase ≥50 % of creatinine basis. Algorithm accuracy was tested on 20 single drugs, 10 nephrotoxic and 10 non-nephrotoxic. We then tested 45 pairs of non-nephrotoxic drugs, among the most prescribed at our hospital and representing distinct pharmacological classes for DDIs.
Sensitivity and specificity were 50 % [95 % confidence interval (CI) 23.66-76.34] and 90 % (95 % CI 59.58-98.21), respectively, for single drugs. Our algorithm confirmed a previously identified signal concerning clarithromycin and calcium-channel blockers (unadjusted odds ratio (ORu) 2.92; 95 % CI 1.11-7.69, p = 0.04). Among the 45 drug pairs investigated, we identified a signal concerning 55 patients in association with bromazepam and hydroxyzine (ORu 1.66; 95 % CI 1.23-2.23). This signal was not confirmed after a chart review. Even so, AKI and co-prescription were confirmed for 96 % (95 % CI 88-99) and 88 % (95 % CI 76-94) of these patients, respectively.
Data mining techniques on CDW can foster the detection of adverse drug reactions when drugs are used alone or in combination.
Background: The objective of this study was to measure the prevalence of inflammatory bowel disease (IBD) among patients with autism spectrum disorders (ASD), which has not been well described previously.
Methods: The rates of IBD among patients with and without ASD were measured in 4 study populations with distinct modes of ascertainment: a health care benefits company, 2 pediatric tertiary care centers, and a national ASD repository. The rates of IBD (established through International Classification of Diseases, Ninth Revision, Clinical Modification [ICD-9-CM] codes) were compared with respective controls and combined using a Stouffer meta-analysis. Clinical charts were also reviewed for IBD among patients with ICD-9-CM codes for both IBD and ASD at one of the pediatric tertiary care centers. This expert-verified rate was compared with the rate in the repository study population (where IBD diagnoses were established by expert review) and in nationally reported rates for pediatric IBD.
Results: In all of case-control study populations, the rates of IBD-related ICD-9-CM codes for patients with ASD were significantly higher than that of their respective controls (Stouffer meta-analysis, P < 0.001). Expert-verified rates of IBD among patients with ASD were 7 of 2728 patients in one study population and 16 of 7201 in a second study population. The age-adjusted prevalence of IBD among patients with ASD was higher than their respective controls and nationally reported rates of pediatric IBD.
Conclusions: Across each population with different kinds of ascertainment, there was a consistent and statistically significant increased prevalance of IBD in patients with ASD than their respective controls and nationally reported rates for pediatric IBD.