Zhamilya Bilyalova is a rising junior at the Princeton International School of Mathematics and Science, where she was first introduced to machine learning by programming in R and studying how to model and understand complex datasets. She has done several data science projects, collaborated with a biology research student and is a member of the school’s data science team, which was a finalist in the only international data science competition for high-school students. Zhamilya has always been fascinated by Wolfram technologies, and is curious to compare R and Wolfram technologies for machine learning. She is interested in more fields like applied engineering and is eager to find connections. Back in Kazakhstan, she started programming in Pascal and C++ and participated in national and international math, programming and linguistics competitions. She is also very interested in using data science to understand global problems like the refugee crisis. In her free time, she enjoys reading, running, looking after PRISM’s organic farm, having good conversations and doing art.
Project: Determine If a Piece of Text Is a Question
The goal of this project was to create a classifier using a neural network or the Classify function to determine if a piece of text is a question.
Main Results in Detail
The training dataset consists of two columns of questions and statements with their corresponding classes of questions or statements taken from a Quora question pairs dataset, summaries of Wikipedia articles and movie lines to have examples that are most similar to everyday language. New parameters were created using the text analysis capabilities of the Wolfram Language and different statistical models were applied inside the Classify function. A linear neural net was created and performed classification with 90% accuracy. Comparably, the Classify function had an accuracy of 85.5%.
This classifier has a lot of applications. For instance, it can be used in autocorrection or for analyzing audio transcripts for virtual assistants like Siri. The future work will include improving the accuracy of the neural network by optimizing it’s architecture, allowing the network to, for example, comprehend contextual questions.