Alumni
Daniel Shin
Bio
Daniel Shin will be a junior at Chadwick International located in Songdo, South Korea during the 2019–2020 school year. At Chadwick International, he is involved with a VEX robotics team, programming club, economics research club, and varsity badminton. Outside of school, Daniel spends time researching about computer architecture, reading comics, and going on 9gag.
Project: Character Shape Analysis
Goal
The goal of this project is to create a program to recognize words off of an image. The most common AI-assisted image processing project to date is the handwritten digit analysis utilizing the MNIST dataset, which contains various data samples on handwritten digits, which are organized into uniform sizes. This project is an extension of this, evaluating handwritten characters (from the EMNIST dataset), and progresses to recognizing whole words and evaluating "possible" words that can be derived from writing.
Summary of Results
During this project, algorithms were designed to recognize words in an image. However, results weren't exactly as satisfying as expected. The accuracy of the trained neural network was quite high, but there were some other issues as well. Using character-based relationships, the program successfully impedes consonants from following consonants, such as "Q" following "M"; however, the replaced vowel wasn't always quite accurate. Other attempts were made, such as adding weights as well. Only after creating weights to decrease the influence of the character association map was the combination of the two probabilistic values yielding confident results.
Future Work
There are various methods on simulating OCR. For future works, creating neural networks to input whole words and sentences will be able to create a more feasible and easy-to-use product. Furthermore, utilizing similar concepts from this project, using word mapping (Markov chains), it might be possible to predict wrong characters or words in a sentence and to use image processing neural networks to compensate and replace certain characters.