Skip Navigation in page
Skip to global menu
Skip to local menu
Skip to main content

Natural Language Processing

Many companies and research agencies are conducting various surveys to research customer satisfaction and market trends, in order to become more customer-oriented. Surveys generally include multiple-choice questions (providing quantitative results) and boxes to be freely filled in, (providing qualitative results). Given the increasingly diverse preferences of customers, the qualitative information obtained from comments written in the fill-in boxes and those posted on websites is becoming more important. However, to analyze this information, the person in charge must review the comments one by one and understand the intended meaning. This process not only requires a considerable amount of time, but also causes the results to vary according to each person's experience and knowledge when analyzed by multiple people.
To solve this problem, Fuji Xerox is researching and developing natural language processing technology that can effectively collect and analyze the text data written in surveys and other studies by using computer analysis in addition to human interpretation.

By using this technology, a massive amount of text data can be automatically categorized and organized in the form of comprehensive numerical values and charts as indicated below. These charts provide a quantitative overview and enable trends and changes in the entire market to be easily communicated to others. The quantitative/visual data that is automatically generated also enables the analyst to concentrate more thoroughly on analysis.

The following describes this technology by using a usage example.

Example of a survey: "Tell us your thoughts about how to use cell-phones."
Example of a survey: "Tell us your thoughts about how to use cell-phones."

Flow of the analysis is described below.

Morphological Analysis
Text data is broken down into morphemes (e.g. nouns, adjectives, particles, verbs) and the "words" are extracted. For nouns, a "word" contains one morpheme; for other elements, it contains multiple morphemes.
Local Chunk Generation
Morphemes and "words" are put together to form a clause called chunk. (Chunk is the smallest unit of words that forms a meaning when a sentence is segmented.)
Modification Generation
The modifier-modified relation in adnominal modification (modifying an indeclinable word) and adverbial modification (modifying a declinable word) is clarified.
Meaning Chunk Generation
Meaning chunk consisting of a predicate and all modifiers that share that predicate is generated.
Database Registration
Each meaning chunk is linked to another to form an adverbial or adnominal modification. The way these modifiers and modified chunks are linked is registered.

Analysis Results

Fig. 1, "Topic Graph," Fig. 2, "Percentage Difference Graph (with numerical values)," and Table 1, "Modification Frequency," show meaning chunks contained in the database. From these figures and table, analysts can obtain the information described below.

Fig. 1 Topic Graph
This figure shows the relation between the "topics" (nouns) in a text. It indicates the topics likely to be used with a given topic in the text. The relation between topics can be visually understood by the number and direction of arrows between them.

We can see that "Power," "OK," "Voice," "Manner," and "Disturbing" are related to "Train" in this text because these topics are connected by pink arrows. "Train" and "Manners" are connected by arrows in both directions, thereby indicating that both topics are closely related.

Fig. 2 Percentage Difference Graph
The amount of topics in the text is calculated and ranked. This figure enables analysts to visually grasp the usage characteristics of the topics according to the gender, age, and other classifications of the respondents (e.g. whether a topic is used more or less than others in a certain classification).

Table 1 Modification Frequency
This table shows how frequent a certain relation between a noun (modifier) and verb (modified) or that between a noun (modifier) and adjective/adverb (modified) is established. This enables analysts to understand the qualitative categorization of opinions and how often they are reported.

Go to TOP