Adverse-effect statement normalization
In the adverse-effect statement normalizationNote1 step, a drug name or an adverse effect expressed in different words in each discharge summary is matched to a word in a dictionary. Specifically, a drug name is matched to a therapeutic category, whereas an adverse effect is matched to a term listed in MedDRA/J, which is a terminology for the description of adverse effects. A Support Vector Machine (SVM) based method of solving orthographic varianceNote2 is used for matching words of the same meaning. Besides solving orthographic variance, this method normalizes drug names and adverse effects using therapeutic categories and the hierarchical structure of MedDRA/J. For the purpose of improving the coverage of words that are available for matching, terms that are equivalent to those in MedDRA/J are extracted and added to the dictionary.
(1) Extended MedDRA/J dictionary (dictionary extension)
MedDRA is a hierarchical medical terminology developed by the International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use (ICH). MedDRA/J is the Japanese version of the terminology. MedDRA/J has a five-tiered structure consisting of System Organ Class (SOC), High-Level Group Terms (HLGT), High-Level Terms (HLT), Preferred Terms (PT), and Lowest Level Terms (LLT). In Japan, terms listed in MedDRA/J are recommended for use in representing the adverse effects of drugs. However, words listed in MedDRA/J do not necessarily represent adverse effects. In addition, various expressions, including those not listed in MedDRA/J, are used in discharge summaries to represent adverse effects. Therefore, the following methods are used to add symptoms representing adverse effects to MedDRA/J and extend the dictionary.
Firstly, words listed in the Adverse Reactions section of a package insert are extracted as candidates for incorporation into MedDRA/J. Then the system compares the words extracted from the package insert to terms listed in MedDRA/J, and judges whether the extracted words are terms equivalent to those listed in MedDRA/J by employing the orthographic disambiguation method. Extracted words that are deemed to be terms equivalent to those in MedDRA/J are added to the dictionary.
(2) Drug-efficacy list
A three-digit therapeutic category code, which is a system of classifying the intended usage of drugs, is manually assigned to a drug that is extracted as the cause of an adverse effect. Because a drug may have more than one therapeutic category code, drugs and therapeutic category codes do not necessarily correspond on a one-to-one basis.
(3) Orthographic disambiguation
As shown in Fig. 2, the SVM-based orthographic disambiguation method judges whether two words are equivalent by using characters that are not matched between the two words (DIFF) and their adjacent characters (PRE and POST) as features.
- Note1 Conversion of data into a more usable form according to certain rules
- Note2 Use of differently spelled words of the same meaning and pronunciation, such as the words "center" and "centre."