Nestor Diaz obtained a systems engineer title at Universidad Industrial de Santander and a master's in computer science at Universidad del Valle. Currently, he is in the middle of his PhD at Universidad del Valle. His central research theme is cellular automata identification and its application to real-life problems. His PhD research is focused on cellular automata identification to predict protein contact maps.
Since 2004 he has been a professor at Universidad del Cauca, where his main courses are related to programming languages, complex systems modeling and computational intelligence. Since 2013 he has been a member of the Research Group on Computational Intelligence at Universidad del Cauca.
A New Kind of Science has been an inspiring book in his work, and he believes that Wolfram's ideas will aid complex systems modeling and analysis.
Project: Behavior Classification in Turing Machines
We explored the space of three-state, three-color Turing machines (TM), trying to identify behavior classes in them. The search space is huge—around 200 billion TMs ((2*3 states*3 colors)^(3 states*3 colors)). This implies that we can't classify all TMs, and that the analysis will be made through sampling of the search space. We need to collect around 4,200 TMs for each dataset to achieve a confidence level of 99% with a confidence interval of +/- 2%.
Taking advantage of the Wolfram Language function ClusterClassify, we trained and tested several datasets, applying in each iteration the knowledge acquired in the previous one. We characterize TMs by means of five measures: entropy, evolution toward right and left sides, periodicity and compression as an indicator of repetitiveness. This set of measures was previously used by Paul-Jean Letourneau as a way to identify TMs with interesting behavior (a full description of his approach can be found on the Wolfram Blog under the title "Hunting for Turing Machines at the Wolfram Science Summer School").
We validated our clustering results using silhouettes, which are useful to assess membership of a TM to its cluster, to watch the quality of clusters in a plot and to select the best number of clusters. The ideal case for a cluster's silhouette is a rectangle with height 1.0. Below, we show the silhouettes' diagrams for clustering with four, five and six clusters. Silhouettes allow us to conclude that: (1) four clusters are a bad choice due to cluster 3 having almost every TM misplaced (negative values); and (2) clustering with five and six partitions has better behavior—despite that, we have a high proportion of TMs with the wrong membership in cluster 1. Cluster 1 in the case of five partitions has a bigger proportion of TMs correctly classified, but fails to classify around 1,700 TMs. Cluster 1 for six partitions has more errors, but can classify all TMs in the dataset.
Finally, with a dendrogram we can show relationships between clusters. Cluster 1 appears in a branch apart from other clusters; maybe this is the class of more complex behavior, and this can be the reason behind the high proportion of errors in this cluster. Clusters 2 through 5 appear as part of a subclassification, where clusters 2 and 5 have some kind of similar behavior. Cluster 3 has some resemblance with clusters 2 and 5. Cluster 4, although related, shows a monotonous behavior with few changes over time evolution.