Research in Japanese Optical Character Recognition

Duewer, Trent A., Department of Computer Science, University of Virginia
Martin, Worthy, Department of Computer Science, University of Virginia

The purpose of this project was to explore and learn better ways to do Japanese Optical Character Recognition (OCR). I researched Japanese OCR by writing a simple program that can recognize some Japanese characters. OCR is scanning in a written document and converting the pictures of characters inside the document into text. OCR is a laborsaving device because it eliminates the need to type a written document into the computer. The program developed in this thesis does preprocessing to reduce the complexity of the image and to prepare for the neural network. A character is read in from a TIFF file and is put into a thinning algorithm. The thinning algorithm reduces the character’s strokes to one pixel in width. Each marked pixel in the thinning algorithm has exactly two neighboring marked characters unless it is an endpoint or an intersection. An endpoint has exactly one marked neighboring pixel and an intersection has exactly two marked neighboring pixels. A neural network in the program will take the character and decide which character it is. I trained the Neural Network to recognize the character by giving the Neural Network a set of characters and the output that should be produced by each character. Finally, the program will send the character in JIS format to a file that can be read by most Japanese word processors. New algorithms and procedures are developed and used in this prototype program which should be useful to other people making Japanese OCR programs.

BS (Bachelor of Science)

Thesis originally deposited on 2011-12-28 in version 1.28 of Libra. This thesis was migrated to Libra2 on 2016-11-30 15:15:02.

All rights reserved (no additional license for public reuse)
Issued Date: