As globalization accelerates both economically and culturally, there is an ever-increasing demand to obtain the latest information from around the world and communicate smoothly with people from other countries. In order to respond to these needs, web-based machine translation services are provided and used to translate various languages. Using such services to translate printed materials such as books and handouts becomes more cumbersome, however, as the user must type the text into the machine translation system. When it comes to such complex languages as Korean and Chinese, typing in the text on a keyboard is difficult. Fuji Xerox has developed Scan Translation Technology to address these issues and achieve an easier translation process.
Fig. 1 depicts a common way of using our Scan Translation Service. First, the user places the document to be translated on the scanner of a multifunction device. Then, after selecting the language in which the document is created and the language in which to translate the document, along with the dictionary to be used, the user presses the start button. After receiving notification by email that the translation process has been completed, the user can print out the translated result. The use of a multifunction device to scan documents thus eliminates the time and effort needed to type in text, thereby streamlining the translation process. In addition, by using this service, a translation can be obtained even when the user cannot read the language of the original document.
Our goal at Fuji Xerox is not just to translate documents; we are also focused on how to transcend the language barrier and facilitate better communication through the use of documents. We believe this service will be extremely useful in helping users who speak different native languages to share information through the same document. We therefore suggest using a "ruby-style translation" whereby ruby characters are added to the original document, so as to position two languages closely together for easier reference. This method also combines the advantage of not having to frequently look back and forth between the original and translated documents.
When the reader only wishes to read the translated text, we suggest using "replacement translation," in which the original text is replaced by its translation. There is also "word translation," in which certain words are translated and described in ruby style. The translation style can be selected depending on the user's preference, but all styles faithfully reproduce the figures and photos in the original document, thus allowing for easy comprehension of the document.
This translation service can be used not only for paper documents but also for electronic documents. In addition, documents can be printed from a multifunction device or downloaded as electronic documents (Fig. 2).
The image processing technology cultivated by Fuji Xerox plays an important role in enabling the features above (Fig. 3).
First, preprocessing such as skew correction is performed on the scanned document image. Then, using image analysis, the scanned document is separated into the text area, image area that includes pictures and illustrations, and ruled area, followed by the extraction of layout information. The image area and ruled area are reproduced as background images according to the layout information. Meanwhile, the area analyzed as text is processed by optical character recognition (OCR) to extract text information. The extracted text information is then translated into the target language, and the text is overlaid on the background images in the appropriate font size and position as calculated based on the layout information.