As globalization of both economics and culture progresses, the importance of gathering information from foreign countries quickly and communicating smoothly with foreigners increases.
To respond to these needs, various Internet-based machine translation services are already available, but many issues remain to be addressed with these services. For example, printed matters, such as books and other distributed materials must be typed into a computer via a keyboard every time, which is not user-friendly. Furthermore, for languages that are difficult to read, such as Chinese and Korean, the user may not always be able to read the characters and inputting them is not an easy task. This is the same with the Japanese language when used by those not fluent in it. Fuji Xerox is therefore developing a system that can easily translate printed materials.
The service we are developing at Fuji Xerox is shown in Figure 1. First, place the document to be translated onto the flat-bed scanner of the multifunction device. Next, select the language into which the document is to be translated and it will be printed out onto the tray by just pushing the start button.
Figure 1
In this way, a document can be translated as easily as if it were being copied. By using this method, the document need not be typed into a computer, making translation very efficient. The document is read by a multifunction device, so a translation can be obtained even if the user cannot read the original document.
At Fuji Xerox, our goal isn’t just to translate documents; we are focused on how to transcend the language barrier and facilitate communication through the use of documents. We believe this service will be extremely useful for helping users with different native languages to share information through the same document. We therefore suggest using a “ruby-style translation” whereby ruby characters are added to the original document so that two languages are located close each other for easier reference. This method also combines the advantages of not having to look back and forth often between the original and the translated documents.
Otherwise, if just reading the translated document is desired, we suggest using “replacement translation”, which lays out the translation in exactly the same way as the original. The user can choose the desired method of translation and can easily understand the content of the document, since both styles reproduce illustrations and pictures in the translated document.
Image processing technology that has accumulated at Fuji Xerox is performing an important role for helping to realize such features (Figure 2).
First, scanned documents are separated through image analysis into text area, image area which includes pictures and illustrations, and ruled area, and then the layout information is extracted. The image area and the ruled area are reproduced as background images on the basis of the layout information. Meanwhile, the text area is processed by optical character recognition (OCR) and the text information is extracted. The extracted text information is then translated into the target language with the appropriate font size and positions calculated based on the layout information, and it is then overlaid on the background image. By combining image processing technologies, a simple operation can be used to translate documents into a specified language.