Information Extraction Technology by Means of Language Processing

At the moment, much research is being conducted on natural language processing. The medical field is no exception, and various tools have been developed to extract valuable information from a huge volume of electronic health records. In order to promote the sharing of knowledge in the business environment, Fuji Xerox has been working on research and development of natural language processing technology to extract valuable information from a vast amount of text data, and then aggregate and analyze that information. For the purpose of applying the language processing technology that it had developed to the medical field, Fuji Xerox collaborated with the University of Tokyo Hospital in research from 2007 to 2012. This collaborative research led to the development of technology that extracts records of medication and adverse effects associated with it from the text of discharge summariesNote1 that give an overview of the progress of treatment since admission. Fuji Xerox and the University of Tokyo Hospital also conducted research and development of a system that automatically creates a contingency table, in which adverse-effect statements are aggregated by drug and adverse effect from the data extracted by the technology mentioned above. In this research, different kinds of technology that underlie language processing, including term extraction, relation extraction, orthographic disambiguation, and dictionary extension, are integrated to develop an adverse-effect relation aggregation system (Fig. 1) that supports the investigation of situations where drug-induced adverse effects occur. This system has a feature to identify sentences about adverse effects in discharge summaries and aggregate adverse-effect statements by drug and adverse effect.

  • Note1 Text that briefly describes the patient's background prior to admission and the progress made during hospitalization. Out of respect for privacy, we only used data anonymized by the hospital in this research.

Fig. 1: Configuration of the adverse-effect relation aggregation system

The adverse-effect relation aggregation system mainly consists of the three features shown below.