Wolfram Computation Meets Knowledge

Wolfram Summer School


Lina Baquero

Technology and Innovation

Class of 2017


Lina graduated from Liberty University in 2015 with a BA in computer science. Currently she is working with a cybersecurity software application that distributes and manages cryptographic keys for companies around the world. In addition, she is applying to start her master’s in statistics in 2018, given her passion for mathematics and interest in artificial intelligence. In her spare time, she enjoys playing guitar, running long distance and listening to podcasts.

Computational Essay

An Introduction to Markov Chains »

Project: Cellular Automata Unsupervised Clustering

Goal of the project:

To find clusters of cellular automata based on features of the patterns they form after a certain number of generations.

Summary of work:

In order to create a cellular automata dataset for a specific rule, binary images were created with initial conditions from the binary representation of 1 to 60 and 100 generations, given the capacity of the computer and time constraints. In addition, after further testing, these proved to be reasonable intervals.

With this dataset, the images were grouped into similar clusters through an unsupervised data mining technique of clustering, implemented by the k-means algorithm. The workflow of the model includes preprocessing the cellular automata, followed by the extraction of image features and finally, the cluster analysis.

Results and future work:

The process presented in this project to cluster cellular automata based on binary images by means of a learning technique over a set of extracted features proved reasonable and effective after statistical analysis within clusters and human observation. However, given that the process was conducted with only a small amount of a priori information and that the clustering algorithm needed the number of clusters, an intermediate clustering procedure using a nearest-neighbor chain and standard deviations with feature vectors was used. This second algorithm generated a reasonable k in most of the cases, and in some cases more clusters than necessary. Therefore, a better k optimization technique or a greater number of features should be included in further research. Instead of using the Euclidian distances to measure similarity, the use of weighting attributes according to relevance could improve the clustering. Finally, data about how clusters change depending on the number of generations could improve insight into cellular automata behavior.