A model for indexing medical documents combining statistical and symbolic knowledge.

Date Published:

2007

Abstract:

OBJECTIVES: To develop and evaluate an information processing method based on terminologies, in order to index medical documents in any given documentary context. METHODS: We designed a model using both symbolic general knowledge extracted from the Unified Medical Language System (UMLS) and statistical knowledge extracted from a domain of application. Using statistical knowledge allowed us to contextualize the general knowledge for every particular situation. For each document studied, the extracted terms are ranked to highlight the most significant ones. The model was tested on a set of 17,079 French standardized discharge summaries (SDSs). RESULTS: The most important ICD-10 term of each SDS was ranked 1st or 2nd by the method in nearly 90% of the cases. CONCLUSIONS: The use of several terminologies leads to more precise indexing. The improvement achieved in the models implementation performances as a result of using semantic relationships is encouraging.