Adverse-effect relation extraction
In the adverse-effect relation extraction step, pairs of diseases or symptoms and drugs that caused them are extracted from a discharge summary, which is a clinical document that briefly describes the patient's background prior to admission and the progress made during hospitalization.
When the text data of discharge summaries is input into the system, the system extracts medical terms in the text, which includes the names of drugs, symptoms, and medical tests. Then, a determination is made regarding whether a symptom is caused by a drug for pairs of symptoms and drugs extracted from the text.
(1) Discharge summary corpusNote1
An annotationNote2 tool used for annotating modalities and information on adverse-effect relations. Modalities include such types as S/O, PURPOSE, NEGATION, and NECESSITY, all of which are frequently used in the medical field. This tool is used for machine learning.Note3
(2) Medical term extraction
Medical terms belonging to 13 categories, such as drug, symptom, and test results, are automatically extracted from discharge summaries. For extracting medical terms, the system first converts the text of discharge summaries input into the system character by character in IOB2 format.Note4 Then the system predicts the tag sequences employing machine learning based on Conditional Random Field (CRF). The information used as featuresNote5 for machine learning is the morphological informationNote6 of the words containing five characters (the character to be judged as well as the two characters immediately preceding and following it). For example, in order to judge whether the character "腫" contained in the phrase "下肢浮腫の改善 (improvement in leg edema)" is a part of a symptom term, the morphological information of the words "下肢 (leg)," "浮腫 (edema)," "の (in)," and "改善 (improvement)," to which each of the five characters including "肢," "浮," "腫," "の," and "改" belong, is used as a feature.
(3) Adverse-effect relation extraction
Based on the definition of an adverse-effect relation as a causal relationship in which a drug induces a symptom, the system extracts adverse-effect relations. Fig. 2 shows an example of an adverse-effect relation.
A machine learning method called Support Vector Machine (SVM) is used for extracting adverse-effect relations. Sentences containing drug and symptom expressions are used as datasets for machine learning. Among all the pairs of drugs and symptoms, developments, or medical test results appearing in sentences, pairs between which an adverse-effect relationship is established are used as positive examples to train a classifier, whereas pairs with no adverse-effect relationship are used as negative examples. As features for identifying a pair with an adverse-effect relationship, the system utilizes the number of characters and morpheme between a drug and a symptom, the order in which a drug and a symptom appear in a text, a dependency chain, and clinical expressions in text.
- Note1 A large collection of writings in a natural language such as Japanese and English. In the fields of linguistics and information processing, corpora are utilized in research on natural language processing or as datasets for machine learning.
- Note2 Attachment of related information (metadata) to data.
- Note3 Construction a predictive model to predict an unknown outcome from historical data.
- Note4 A format for assigning a tag to a unit of characters (chunk). B, I, and O denote the beginning of a chunk, inside a chunk, and outside a chunk, respectively. In this system, the medical term is used as a unit of chunk.
- Note5 A clue used for machine learning.
- Note6 The smallest meaningful unit of language. A language unit used for dividing a sentence written in a natural language.