Laura graduated from American University in Washington, DC, with double majors in physics and philosophy in 2012. For two years, she worked at the National Institutes of Health in Bethesda, Maryland, in the Postbaccalaureate Intramural Research Training Award program. There, she engaged in fMRI methods research, focusing on graph theory metrics for the evaluation of brain connectivity. She continues to collect fMRI data for the Research Foundation of CUNY.
Currently, she is a master’s student in data science at New York University. Recent courses of study include machine learning, text data analysis and implementation of big data methods. Laura looks forward to the Wolfram Summer School and using Wolfram products as tools to further develop these methodological approaches for data analysis.
Project: Predicting and Recommending Connectivity in the Wolfram Language Documentation Graph
The Wolfram Language & System Documentation Center provides descriptions, parameters and methods for each function deployed in Wolfram Mathematica. Additionally, these webpages include a section entitled “See Also,” where other Mathematica functions are hyperlinked. The purpose of these links is to guide the user to other functions that are likely to be of interest to them, given that they have visited the documentation of the current function.
However, it is not always clear what criteria should be used to determine when one function should link to another. Generally, the connections are made at the developer’s discretion.
Can we predict where connections between functions will be present?
Can we recommend where connections should be present?
At first, it may be intuitive to begin with a qualitative graph analysis approach. We have a set of connections, and we want to visualize how those connections are organized, and perhaps describe some characteristic behavior. However, just plotting the graph proves this approach to be unwieldy.
Instead, we computed graph network metrics, which we will feed into Classify. Each function was considered a node, and each connection a directed edge. For each connection (and a balanced set of “potential” connections), we computed over a dozen features, including:
- Ratio of degrees in and out for a pair of functions
- Shortest path between the functions
- Number of connections in common
- Ratio of PageRankCentrality
- String overlap in the function names
Simply using Classify, we are able to predict whether a directed connection exists between two functions, with accuracy over 97%.
With this model, we can build a tool to help developers inquire about specific functions. We can now take any function, consider directed pairs with all other functions and predict whether the directed connection is currently present. If not, we recommend connections that the model predicts with high probability, and display how the addition of these recommended connections would change that function’s local community.
Favorite 3-Color 2D Totalistic Cellular Automaton
Silly, but after writing the code using a rule I knew had interesting behavior, I entered my date of birth as the first new rule number to test. I was excited to find that it gave this interesting array plot. Additionally, again by chance, the number of steps I first tried happened to give a nice “frame” to the array, as if it were in a photo album. This is not the case with other attempted step numbers. Feeling guilty that I found an interesting rule on the first shot, I tried all the other dates of birth I could remember of people I knew, and none of them had complicated behavior.