Publications

2012
Pariente A, Avillach P, Salvo F, Thiessard F, Miremont-Salamé G, Fourrier-Reglat A, Haramburu F, Bégaud B, Moore N. Effect of competition bias in safety signal generation: analysis of a research database of spontaneous reports in France. Drug Saf 2012;35(10):855-64.Abstract
BACKGROUND: Automated disproportionality analysis of spontaneous reporting is increasingly used routinely. It can theoretically be influenced by a competition bias for signal detection owing to the presence of reports related to well-established drug-event associations. OBJECTIVE: The aim of the study was to explore the effects of competition bias on safety signals generated from a large spontaneous reporting research database. METHODS: Using the case/non-case approach in the French spontaneous reporting research database, which includes data of reporting in France from January 1986 to December 2001, the effects of the competition bias were explored by generating safety signals associated with six events of interest (gastric and oesophageal haemorrhages, central nervous system haemorrhage and cerebrovascular accidents, ischaemic coronary disorders, migraine headaches, muscle pains, and hepatic enzymes and function abnormalities) before and after removing from the database reports relating to drugs known to be strongly associated with these events, whether they constituted cases or non-cases. As this study was performed on a closed database (last data entered 31 December 2001), potential signals unmasked by removal were considered as real signals if no or only incomplete knowledge about the association was available from the literature before 1 January 2002. RESULTS: For gastric and oesophageal haemorrhages, after removing reports involving antithrombotic agents or NSAIDs, three potential signals were unmasked (prednisone, rivastigmine and isotretinoin). For central nervous system haemorrhage and cerebrovascular accidents, after removing reports involving antithrombotic agents, three potential signals were unmasked (ethinylestradiol, interferon-α-2B and methylprednisolone). For ischaemic coronary disorders, after removing reports involving anthracyclines, bleomycine, anti-HIV drugs or triptans, one potential signal was unmasked (ondansetron). For migraine headaches, after removing reports involving nitrates, calcium channel blockers, opioid analgesics or intravenous immunoglobulins, six potential signals were unmasked (ammonium chloride, leflunomide, milnacipran, montelukast, proguanil and pyridostigmine). For muscle pains, after removing reports involving statins or fibrates, seven potential signals were unmasked (hydroxychloroquine, lactulose, levodopa in combination with dopadecarboxylase inhibitor, nevirapine, nomegestrol, ritonavir and stavudine). Finally, for hepatic enzymes and function abnormalities, after removing reports involving NSAIDs, anilides, antituberculosis drugs, antiepileptics, ketoconazole, tacrine, or amineptine, two potential signals were unmasked (caffeine, metformin). Of all these unmasked potential signals, ten appeared non/incompletely documented as at 1 January 2002 and were considered as real signals, with three of these later being confirmed by the literature and finally considered as true positives (isotretinoin, methylprednisolone and milnacipran). CONCLUSION: This study confirms that a competition bias can occur when performing safety signal generation in spontaneous reporting databases. The minimization of this bias could lead to previously masked signals being revealed.
Sahut D'Izarn M, Caumont Prim A, Planquette B, Revel MP, Avillach P, Chatellier G, Sanchez O, Meyer G. Risk factors and clinical outcome of unsuspected pulmonary embolism in cancer patients: a case-control study. J Thromb Haemost 2012;10(10):2032-8.Abstract
BACKGROUND: Little is known about the risk factors and outcome of unsuspected pulmonary embolism (UPE) in cancer patients. OBJECTIVES: To assess the risk factors and outcome of UPE in cancer patients. METHODS: The charts of 66 patients diagnosed with UPE were reviewed. Two control groups were selected: 132 cancer patients without pulmonary embolism (PE) and 65 cancer patients with clinically suspected PE. Variables associated with UPE were identified by multivariable analysis. Six-month survival and recurrent venous thromboembolism were compared by use of Cox proportional analysis. RESULTS: Twenty-seven (40.9%) patients with UPE had symptoms suggesting PE. Adenocarcinoma (odds ratio [OR] 4.45; 95% confidence interval [CI] 1.98-9.97), advanced age (OR 1.18; 95% CI 1.02-1.38), recent chemotherapy (OR 4.62; 95% CI 2.26-9.44), performance status > 2 (OR 7.31; 95% CI 1.90-28.15) and previous venous thromboembolism (OR 4.47; 95% CI 1.16-17.13) were associated with UPE. When adjusted for tumor stage and performance status, 6-month mortality did not differ between patients with UPE and patients without PE (hazard ratio 1.40; 95% CI 0.53-3.66; P = 0.50). Patients with UPE were more likely to have central venous catheters and chemotherapy and less likely to have proximal clots than patients with clinically suspected PE. Recurrent venous thromboembolism occurred in 6.1% and 7.7% of patients with UPE and symptomatic PE, respectively. CONCLUSION: UPE is not associated with an increased risk of death. Patients with clinically suspected PE and those with UPE have similar risks of recurrent venous thromboembolism.
2011
Trifirò G, Patadia V, Schuemie MJ, Coloma PM, Gini R, Herings R, Hippisley-Cox J, Mazzaglia G, Giaquinto C, Scotti L, Pedersen L, Avillach P, Sturkenboom MCJM, van der Lei J, van der Lei J. EU-ADR healthcare database network vs. spontaneous reporting system database: preliminary comparison of signal detection. Stud Health Technol Inform 2011;166:25-30.Abstract
The EU-ADR project aims to exploit different European electronic healthcare records (EHR) databases for drug safety signal detection. In this paper we report the preliminary results concerning the comparison of signal detection between EU-ADR network and two spontaneous reporting databases, the Food and Drug Administration and World Health Organization databases. EU-ADR data sources consist of eight databases in four countries (Denmark, Italy, Netherlands, and United Kingdom) that are virtually linked through distributed data network. A custom-built software (Jerboa©) elaborates harmonized input data that are produced locally and generates aggregated data which are then stored in a central repository. Those data are subsequently analyzed through different statistics (i.e. Longitudinal Gamma Poisson Shrinker). As potential signals, all the drugs that are associated to six events of interest (bullous eruptions - BE, acute renal failure - ARF, acute myocardial infarction - AMI, anaphylactic shock - AS, rhabdomyolysis - RHABD, and upper gastrointestinal bleeding - UGIB) have been detected via different data mining techniques in the two systems. Subsequently a comparison concerning the number of drugs that could be investigated and the potential signals detected for each event in the spontaneous reporting systems (SRSs) and EU-ADR network was made. SRSs could explore, as potential signals, a larger number of drugs for the six events, in comparison to EU-ADR (range: 630-3,393 vs. 87-856), particularly for those events commonly thought to be potentially drug-induced (i.e. BE: 3,393 vs. 228). The highest proportion of signals detected in SRSs was found for BE, ARF and AS, while for ARF, and UGIB in EU-ADR. In conclusion, it seems that EU-ADR longitudinal database network may complement traditional spontaneous reporting system for signal detection, especially for those adverse events that are frequent in general population and are not commonly thought to be drug-induced. The methodology for signal detection in EU-ADR is still under development and testing phase.
2010
Dahdul WM, Balhoff JP, Engeman J, Grande T, Hilton EJ, Kothari C, Lapp H, Lundberg JG, Midford PE, Vision TJ, Westerfield M, Mabee PM. Evolutionary characters, phenotypes and ontologies: curating data from the systematic biology literature. PLoS One 2010;5(5):e10708.Abstract
BACKGROUND: The wealth of phenotypic descriptions documented in the published articles, monographs, and dissertations of phylogenetic systematics is traditionally reported in a free-text format, and it is therefore largely inaccessible for linkage to biological databases for genetics, development, and phenotypes, and difficult to manage for large-scale integrative work. The Phenoscape project aims to represent these complex and detailed descriptions with rich and formal semantics that are amenable to computation and integration with phenotype data from other fields of biology. This entails reconceptualizing the traditional free-text characters into the computable Entity-Quality (EQ) formalism using ontologies. METHODOLOGY/PRINCIPAL FINDINGS: We used ontologies and the EQ formalism to curate a collection of 47 phylogenetic studies on ostariophysan fishes (including catfishes, characins, minnows, knifefishes) and their relatives with the goal of integrating these complex phenotype descriptions with information from an existing model organism database (zebrafish, http://zfin.org). We developed a curation workflow for the collection of character, taxonomic and specimen data from these publications. A total of 4,617 phenotypic characters (10,512 states) for 3,449 taxa, primarily species, were curated into EQ formalism (for a total of 12,861 EQ statements) using anatomical and taxonomic terms from teleost-specific ontologies (Teleost Anatomy Ontology and Teleost Taxonomy Ontology) in combination with terms from a quality ontology (Phenotype and Trait Ontology). Standards and guidelines for consistently and accurately representing phenotypes were developed in response to the challenges that were evident from two annotation experiments and from feedback from curators. CONCLUSIONS/SIGNIFICANCE: The challenges we encountered and many of the curation standards and methods for improving consistency that we developed are generally applicable to any effort to represent phenotypes using ontologies. This is because an ontological representation of the detailed variations in phenotype, whether between mutant or wildtype, among individual humans, or across the diversity of species, requires a process by which a precise combination of terms from domain ontologies are selected and organized according to logical relations. The efficiencies that we have developed in this process will be useful for any attempt to annotate complex phenotypic descriptions using ontologies. We also discuss some ramifications of EQ representation for the domain of systematics.
Balhoff JP, Dahdul WM, Kothari CR, Lapp H, Lundberg JG, Mabee P, Midford PE, Westerfield M, Vision TJ. Phenex: ontological annotation of phenotypic diversity. PLoS One 2010;5(5):e10500.Abstract
BACKGROUND: Phenotypic differences among species have long been systematically itemized and described by biologists in the process of investigating phylogenetic relationships and trait evolution. Traditionally, these descriptions have been expressed in natural language within the context of individual journal publications or monographs. As such, this rich store of phenotype data has been largely unavailable for statistical and computational comparisons across studies or integration with other biological knowledge. METHODOLOGY/PRINCIPAL FINDINGS: Here we describe Phenex, a platform-independent desktop application designed to facilitate efficient and consistent annotation of phenotypic similarities and differences using Entity-Quality syntax, drawing on terms from community ontologies for anatomical entities, phenotypic qualities, and taxonomic names. Phenex can be configured to load only those ontologies pertinent to a taxonomic group of interest. The graphical user interface was optimized for evolutionary biologists accustomed to working with lists of taxa, characters, character states, and character-by-taxon matrices. CONCLUSIONS/SIGNIFICANCE: Annotation of phenotypic data using ontologies and globally unique taxonomic identifiers will allow biologists to integrate phenotypic data from different organisms and studies, leveraging decades of work in systematics and comparative morphology.
Avillach P, Joubert M, Thiessard F, Trifirò G, Dufour J-C, Pariente A, Mougin F, Polimeni G, Catania MA, Giaquinto C, Mazzaglia G, Fornari C, Herings R, Gini R, Hippisley-Cox J, Molokhia M, Pedersen L, Fourrier-Réglat A, Sturkenboom M, Fieschi M. Design and evaluation of a semantic approach for the homogeneous identification of events in eight patient databases: a contribution to the European EU-ADR project. Stud Health Technol Inform 2010;160(Pt 2):1085-9.Abstract
The overall objective of the EU-ADR project is the design, development, and validation of a computerised system that exploits data from electronic health records and biomedical databases for the early detection of adverse drug reactions. Eight different databases, containing health records of more than 30 million European citizens, are involved in the project. Unique queries cannot be performed across different databases because of their heterogeneity: Medical record and Claims databases, four different terminologies for coding diagnoses, and two languages for the information described in free text. The aim of our study was to provide database owners with a common basis for the construction of their queries. Using the UMLS, we provided a list of medical concepts, with their corresponding terms and codes in the four terminologies, which should be considered to retrieve the relevant information for the events of interest from the databases.
Pariente A, Didailler M, Avillach P, Miremont-Salamé G, Fourrier-Reglat A, Haramburu F, Moore N. A potential competition bias in the detection of safety signals from spontaneous reporting databases. Pharmacoepidemiol Drug Saf 2010;19(11):1166-71.Abstract
PURPOSE: To study whether reports related to known drug-event associations could hinder the detection of new signals by increasing the detection thresholds when using disporportionality analyses in spontaneous reporting (SR) databases. METHODS: The French SR database (2005-2006 data) was used to test this hypothesis for the following events: bleeding, headache, hepatitis, myalgia, myocardial infarction, stroke, and toxic epidermal necrolysis (TEN). For each of these, using the Proportional Reporting Ratio (PRR) and the Reporting Odds Ratio (ROR), the number of cases needed to trigger a signal out of 50, 100, and 200 reports for a hypothetical newly introduced drug were computed before and after removing from the database reports involving drugs known to be associated with the event. RESULTS: For bleeding and stroke, removing potentially competitive data resulted in a decrease of the number of cases needed to trigger a signal for a newly introduced drug for both PRR and ROR (e.g., from 9 to 4, and 5 to 3 cases out of 50 reports for bleeding and stroke, respectively using the PRR). They were not or only slightly modified for the other studied events. CONCLUSIONS: Removing reports related to known drug-event associations could increase the sensitivity of signal detection in SR databases. This should be considered when using SR databases for signal detection as it could result in earlier identification of new drug-event associations.
2009
Avillach P, Mougin F, Joubert M, Thiessard F, Pariente A, Dufour J-C, Trifirò G, Polimeni G, Catania MA, Giaquinto C, Mazzaglia G, Baio G, Herings R, Gini R, Hippisley-Cox J, Molokhia M, Pedersen L, Fourrier-Réglat A, Sturkenboom M, Fieschi M. A semantic approach for the homogeneous identification of events in eight patient databases: a contribution to the European eu-ADR project. Stud Health Technol Inform 2009;150:190-4.Abstract
The overall objective of the eu-ADR project is the design, development, and validation of a computerised system that exploits data from electronic health records and biomedical databases for the early detection of adverse drug reactions. Eight different databases, containing health records of more than 30 million European citizens, are involved in the project. Unique queries cannot be performed across different databases because of their heterogeneity: Medical record and Claims databases, four different terminologies for coding diagnoses, and two languages for the information described in free text. The aim of our study was to provide database owners with a common basis for the construction of their queries. Using the UMLS, we provided a list of medical concepts, with their corresponding terms and codes in the four terminologies, which should be considered to retrieve the relevant information for the events of interest from the databases.
Quantin C, Gouyon B, Avillach P, Ferdynus C, Sagot P, Gouyon J-B. Using discharge abstracts to evaluate a regional perinatal network: assessment of the linkage procedure of anonymous data. Int J Telemed Appl 2009;2009:181842.Abstract
To assess the Burgundy perinatal network (18 obstetrical units; 18 500 births per year), discharge abstracts and additional data were collected for all mothers and newborns. In accordance with French law, data were rendered anonymous before statistical analysis, and were linked to patients using a specific procedure. This procedure allowed data concerning each mother to be linked to those for her newborn(s). This study showed that all mothers and newborns were included in the regional database; the data for all mothers were linked to those for their infant(s) in all cases. Additional data (gestational age) were obtained for 99.9% of newborns.
2008
Quantin C, Allaert F-A, Avillach P, Fassa M, Riandey B, Trouessin G, Cohen O. Building application-related patient identifiers: what solution for a European country?. Int J Telemed Appl 2008;:678302.Abstract
We propose a method utilizing a derived social security number with the same reliability as the social security number. We show the anonymity techniques classically based on unidirectional hash functions (such as the secure hash algorithm (SHA-2) function that can guarantee the security, quality, and reliability of information if these techniques are applied to the Social Security Number). Hashing produces a strictly anonymous code that is always the same for a given individual, and thus enables patient data to be linked. Different solutions are developed and proposed in this article. Hashing the social security number will make it possible to link the information in the personal medical file to other national health information sources with the aim of completing or validating the personal medical record or conducting epidemiological and clinical research. This data linkage would meet the anonymous data requirements of the European directive on data protection.
Avillach P, Joubert M, Fieschi M. Improving the quality of the coding of primary diagnosis in standardized discharge summaries. Health Care Manag Sci 2008;11(2):147-51.Abstract
We propose to design and test an information-processing model to participate in appraising the quality and the consistency of the coding, for billing, of Standardized Discharge Summaries (SDSs). We designed a model using both symbolic knowledge extracted from the NLM's UMLS and statistical knowledge. The aim is to retrieve from the ICD-10 terms recorded in a SDS the Principal Diagnosis (PD) at the time of coding. In 90% of cases the PD was retrieved 1st or 2nd in SDS including three ICD-10 codes or more. This model could contribute as part of an automated quality control process in a hospital information system by checking consistency in coded SDSs and improve the income of the hospital.
Joubert M, Darmoni SJ, Avillach P, Dahamna B, Fieschi M. Using knowledge for indexing health web resources in a quality-controlled gateway. Stud Health Technol Inform 2008;136:205-10.Abstract
OBJECTIVES: The aim of this study is to provide to indexers MeSH terms to be considered as major ones in a list of terms automatically extracted from a document. MATERIAL AND METHODS: We propose a method combining symbolic knowledge - the UMLS Metathesaurus and Semantic Network - and statistical knowledge drawn from co-occurrences of terms in the CISMeF database (a French-language quality-controlled health gateway) using data mining measures. The method was tested on CISMeF corpus of 293 resources. RESULTS: There was a proportion of 0.37+/-0.26 major terms in the processed records. The method produced lists of terms with a proportion of terms initially pointed out as major of 0.54+/-0.31. DISCUSSION: The method we propose reduces the number of terms, which seem not useful for content description of resources, such as "check tags", but retains the most descriptive ones. Discarding these terms is accounted for by: 1) the removal by using semantic knowledge of associations of concepts bearing no real medical significance, 2) the removal by using statistical knowledge of nonstatistically significant associations of terms. CONCLUSION: This method can assist effectively indexers in their daily work and will be soon applied in the CISMeF system.
2007
Quantin C, Allaert FA, Fassa M, Riandey B, Avillach P, Cohen O. How to manage secure direct access of European patients to their computerized medical record and personal medical record. Stud Health Technol Inform 2007;127:246-55.Abstract
The multiplication of the requests of the patients for a direct access to their Medical Record (MR), the development of Personal Medical Record (PMR) supervised by the patients themselves, the increasing development of the patients' electronic medical records (EMRs) and the world wide internet utilization will lead to envisage an access by using technical automatic and scientific way. It will require the addition of different conditions: a unique patient identifier which could base on a familial component in order to get access to the right record anywhere in Europe, very strict identity checks using cryptographic techniques such as those for the electronic signature, which will ensure the authentication of the requests sender and the integrity of the file but also the protection of the confidentiality and the access follow up. The electronic medical record must also be electronically signed by the practitioner in order to get evidence that he has given his agreement and taken the liability for that. This electronic signature also avoids any kind of post-transmission falsification. This will become extremely important, especially in France where patients will have the possibility to mask information that, they do not want to appear in their personal medical record. Currently, the idea of every citizen having electronic signatures available appears positively Utopian. But this is yet the case in eGovernment, eHealth and eShopping, world-wide. The same was thought about smart cards before they became generally available and useful when banks issued them.
Quantin C, Allaert F-A, Fassa M, Avillach P, Fieschi M, Cohen O. Interoperability issues regarding patient identification in Europe. Conf Proc IEEE Eng Med Biol Soc 2007;2007:6161.
Avillach P, Joubert M, Fieschi M. A model for indexing medical documents combining statistical and symbolic knowledge. AMIA Annu Symp Proc 2007;:31-5.Abstract
OBJECTIVES: To develop and evaluate an information processing method based on terminologies, in order to index medical documents in any given documentary context. METHODS: We designed a model using both symbolic general knowledge extracted from the Unified Medical Language System (UMLS) and statistical knowledge extracted from a domain of application. Using statistical knowledge allowed us to contextualize the general knowledge for every particular situation. For each document studied, the extracted terms are ranked to highlight the most significant ones. The model was tested on a set of 17,079 French standardized discharge summaries (SDSs). RESULTS: The most important ICD-10 term of each SDS was ranked 1st or 2nd by the method in nearly 90% of the cases. CONCLUSIONS: The use of several terminologies leads to more precise indexing. The improvement achieved in the models implementation performances as a result of using semantic relationships is encouraging.
Quantin C, Allaert F-A, Avillach P, Riandey B, Fieschi M, Fassa M, Cohen O. Proposal of a French health identification number interoperable at the European level. Stud Health Technol Inform 2007;129(Pt 1):503-7.Abstract
The French ministry of Health is setting up the Personal Medical Record (PMR). This innovative tool has long been expected by French Health Authorities, Associations of Patients, other Health's associations, those defending Individual Liberties and the French National Data Protection Authority. The PMR will lead to improvements in many areas such as Diagnosis (Research and monitoring) Healthcare (Management of emergencies, urgent situations, Temporal health monitoring and evaluation), Therapy (Cohorts of patients for Clinical trials and epidemiological studies). The PMR will foster safe healthcare management, clinical research and epidemiological studies. Nevertheless, it raises many important questions regarding duplicates and the quality, precision and coherence of the linkage with other health data coming from different sources. The currently planned identifying process raises many questions with regard to its ability to deal with potential duplicates and to perform data linkage with other health data sources. Through this article, using the electronic health records, we develop and propose an identification process to improve the French PMR. Our proposed unique patient identifier will guarantee the security, confidentiality and privacy of the personal data, and will prove to be particularly useful for health planning, health policies and research as well as clinical and epidemiological studies. Finally, it will certainly be interoperable with other European health information systems. We propose here an alternative identification procedure that would allow France to broaden the scope of its PMR project by making it possible to contribute to public health research and policy while increasing interoperability with European health information systems and preserving the confidentiality of the data.

Pages