Wolfram Computation Meets Knowledge

Wolfram Summer School

Alumni

Bhubanjyoti Bhattacharya

Technology and Innovation

Class of 2017

Bio

Bhubanjyoti is a particle phenomenologist whose recent work focuses on the physics of heavy and light quarks, the Higgs boson and dark matter. He earned a bachelor’s degree in physics at the erstwhile Presidency College in his hometown of Kolkata, India, in 2004, a master’s degree in physics at the Indian Institute of Technology, Kanpur in 2006 and a PhD in (You guessed it!) physics at the University of Chicago in 2011. Bhubanjyoti spent a few years as a postdoc at the University of Montreal and is currently a postdoc at Wayne State University in Detroit, Michigan.

When not playing with data or exploring theories beyond the Standard Model of particle physics, Bhubanjyoti enjoys volunteering with the local children’s center, taking road trips with his spouse and hanging out on the couch with his greyhound.

Computational Essay

Motion of a Classical Particle in a Box »

Project: What Can Special Characters in a Paper on arXiv.org Tell Us?

Goal of the project:

Analyzing special characters in papers on arXiv.org.

Summary of work:

We took data from the MREC—Mathematica REtrieval Collection. Our primary goal was to find the frequency of special characters (like Greek letters and/or mathematical symbols) in any arXiv article and to study this as a function of time, finding periods of growth, stability and decline of usage. From the corpus, we created a dataset containing {Year and Month, Type, Number, Title, Symbols} from each paper. We used the dataset to analyze the frequency of symbols in different arXiv types. As a visualization tool, we used WordCloud and DateListStepPlot. The figure shows the distribution of cumulative symbol frequencies in different arXiv types, from 1998/01 to 2006/12. We used Classify on a portion of our dataset to classify articles by arXiv category. We checked the performance of our classifier on a part of the data, finding a low success rate, and concluded that our classifier function needs further input.

Results and future work:

We constructed a database containing symbols and titles from arXiv.org articles. We used it to visualize the frequency distribution of symbols. Although our effort to use the Classify function and predict the type of arXiv from an article’s title and symbols did not work so well, as a future objective, we wish to expand the database to include key words from the articles that are closely related to the symbols and construct a probability function for obtaining arXiv type given the symbols in an article.