Ashkan is finishing his PhD in mechanical engineering at the University of Texas at Arlington. His research is focused on finite element methods and reduced order modeling. He works on improving iterative solvers and reduced order modeling techniques and extending their applications to optimization and inverse problems. Ashkan’s interest in innovation has steered his attention more toward computer science and mathematics. He has exposure to computer programming and is familiar with data mining and machine learning. Pursuing computer science is one of his long-term goals. His hobby besides sports is reading books, articles, and news in the fields of mathematics, technology, and physical science.
Project: Cellular Automaton on GPU
The cellular automata (CA) are simple parallel programs. The graphics processing unit (GPU) is a piece of hardware designed to accelerate exactly simple parallel programs. Therefore It makes a lot of sense to run CA programs on GPUs. General programming on GPUs is relatively new in the scientific computing community, and the two major GPU manufacturers compete for market share. However, as the NVIDIA GPUs and CUDA programming language are widely adopted by the research community and industry, it makes sense to develop on this particular framework.
In this project I want to write two GPU codes for 1D and 2D cellular automata functions. Each program should be general and work with the same inputs as the native Mathematica function CellularAutomaton. I will start with the 1D problem and try to make it as general as possible. The challenge for the GPU deployment of CA is the fact that a rule-based implementation with code branching will be slow for the single instruction multiple data (SIMP) architecture of GPU processors. Stephen Wolfram in the NKS book has derived single formulas for elementary cellular automata that can replace the rule substitution. This might be too complicated for 2D programs.
Programming implementation is not straightforward. One idea is to write a GPU kernel such that instead of having an argument for the rule number, it takes a function pointer. Then Mathematica code that takes the rule number will simplify it into a function, compile that function into a GPU device code, and then pass the function pointer to the GPU cellular automaton kernel. In the 2D case, if rule simplification was not possible, we could use a lookup table and avoid code branching. In this second idea, we have an extra memory operation that gathers the substitution rules from memory. The CA update will be synchronous for all threads.