Steve Park

Class of 2019


Junseo is a rising junior at the Webb Schools in Claremont, California. He is interested in a variety of fields, from literature to math, but hopes to pursue some combination of computer science and biomedical engineering in the future. In his freshman year, he developed an algorithm that diagnoses the severity of Diabetic Retinopathy, and this project won 2nd place in the Los Angeles County Science and Engineering Fair. His recent research is about the prognosis and survival rate of patients with hematologic cancer (Myelodysplastic Syndrome and Chronic Lymphocytic Leukemia). He has assisted doctors by organizing genetic data and performing statistical analysis at the Seoul National University Hospital Cancer Institute since his freshman year. In his school, he is the captain of the Computer Science and Research club, and the captain of the junior varsity volleyball team. In his free time, he plays the cello, which he has played since he was 8 years old.

Project: Automated Acute Leukemia Symptom Analysis from Microscopical Blood Images


Acute lymphoblastic leukemia (ALL), or acute lymphocytic leukemia, is a type of cancer of the blood and bone marrow that affects white blood cells. The diagnosis of ALL is done through several steps, but one step of this diagnosis that causes errors is looking through the peripheral blood slides to find blast cells that are symptoms of blood cancer. Using image processing and machine learning, images of peripheral blood slides were analyzed. Image processing was used to pick out the lymphoblasts. Specifically, color-related functions were used because the lymphoblasts have a very dark purple color that stands out in a peripheral blood slide image. A classifier was trained and tested to distinguish blast cells and non-blast cells.

Summary of Results

The original purpose of this project was to diagnose ALL from blood images. However, due to the lack of sufficient data from one specific patient, it was only possible to create a program that analyzes individual blood smear images. It was found that both the classifier and supporting functions were quite applicable for the purpose, specifically with the classifier having an accuracy measure of 98.1%. Therefore, it should be expected that by expanding the image dataset to that of a full slide, a comprehensive analysis on ALL and blast cells will be possible. Also, as shown above, peripheral blood smear images that contain blast cells had labeling that caused several errors in the algorithm. In contrast, when the algorithm was tested on 59 blood images of healthy patients, the algorithm identified all of the lymphoblasts through image processing and had a 100% accuracy in classifying the lymphoblasts as non-blast cells.

Future Work

In order to improve the project, data collection from local hospitals will be conducted. Here, I plan to obtain pictures of the entire blood slide of each patient so that my project can be extended to diagnosis of the disease. Also, I will obtain original images of peripheral blood smears of patients with ALL to test my algorithm since the images I obtained from ALL-IDB came with labels. Peripheral blood smears of patients without ALL will also be acquired to get a more data. Further, I will apply the algorithm to not only patients with ALL but also those who have AML and MDS, which are types of blood cancer.