Aryan (nickname: Po), is a rising junior interested in math and computer science. He spent his early childhood growing up in the Bay Area, after which he moved to Pune, a city near the western coast of India, which is renowned for its educational and cultural centers (but more importantly for offering the world's best Vada Pav). Recently, he's been working with an NGO/orphanage based in the city to impart basic computer literacy and programming skills to socioeconomically underprivileged children. In his spare time, he enjoys playing tennis, reading books, and gulping down food.
Project: Creating a Common Word List for Marathi Language
The purpose of this project is to generate a common word list for Marathi and then use the data to obtain some interesting results. The data was extracted from Marathi Wikipedia pages. Marathi is a regional language spoken in Western parts of India. This code can also be used for any other website with some tweaks in the code.
Summary of Results
We extracted the data (Marathi Words) from Marathi Wikipedia and then counted them. Then we created a word cloud and a dataset for data visualization, and it was fascinating to watch all of them.
An option for the future is to develop a dataset for different categories like food items or sports or anything else. The architecture of this project, if modified, can also be used to find common words on social media platforms. The code could be used to generate common word lists for other languages as well, using online sources like Wikipedia, etc.