Wolfram Computation Meets Knowledge

Wolfram Summer School


Chris Cooley

Summer School

Class of 2015


Chris is currently a rising junior at South River High School. He is most interested in artificial intelligence, language, and the various types of computational systems, but also enjoys math, physics, and computer science in general. In his free time, Chris likes to watch political talk shows, search the internet for interesting articles or comment chains to read, learn about on some topic that has caught his attention, or just experiment with the abilities of his computer. He hopes to graduate high school early and be admitted to MIT.

Project: Symbolic Representations of Programming Languages

Every construct—say a system command, a graphics primitive, or even a simple mathematical operator—in the Wolfram Language is an expression and essentially a tree, represented in the language syntax as Head[subexpressions…]. Heads form the nodes of these trees, “branching” out to their various subexpressions, although heads can potentially act as branches for other nodes. Particular parts in these trees can be accessed through indices, which correspond to the branches you would take to get to a given part from the root node. Note that it is the hierarchical structure of the language that makes it so amenable to its own symbolic manipulation features—features that allow Wolfram Language code to generate and manage itself.

Fortunately, other languages—though likely not designed so centrally around tree-like structures—can be made to approximate this hierarchical organization such that we can take advantage of symbolic representations. The SymbolicC Mathematica package, for example, does exactly that. Most C language features can form natural trees—say an if node branching out to a sub-tree for a comparison operation and another sub-tree or two for the possible paths that control could follow. It is these features that are given wrappers to be used as heads in the Wolfram Language (e.g., CWhile). The wrappers are then manipulated essentially as regular Wolfram Language expressions so as to form C code and a SymbolicC function is used to export these expressions as C source.

However, SymbolicC has its limitations: most glaringly, it cannot import arbitrary C code and generate its symbolic representation. Thus, my project goal is to explore ways in which Mathematica can be made to better interface with any other given language, generating a symbolic wrapper framework for that language and a generally successful system for traversing between symbolic and source code representations of that language’s programs. The main focus is on creating a versatile method by which grammars of context-free programming languages or their context-free subsets, augmented with a little “grammar metadata,” can be used to achieve this goal. A parser using Earley’s algorithm will be implemented in the Wolfram Language to create parse trees from source code, and the metadata included with the grammar will be used to create the symbolic wrappers that will be needed to prune and reformat this parse tree into an abstract syntax tree. This abstract syntax tree can be used to find specific instances of programming language “parts of speech” (e.g., functions, conditionals, type declarations, etc.) so as to create a CodeCases function analogous to the TextCases function (new in Mathematica 10.2)—but for code not English words! Examples of such abstract syntax trees will be included with the project, perhaps using a Linux kernel codebase or a simple operating system written only in x86 assembly. (A symbolic representation associated directly with the production rules of the grammar will also be available.)

Such versatile symbolic manipulation would allow for the automatic generation of programs in a variety of different forms, in languages with many divergent features, easing the exploration of the computational space of short programs written in programming languages. This would help expand the reach of the NKS methodology. Provided that there is enough time, some demonstrations of these ideas will also be included.

Example: Symbolic representation of C code in SymbolicC vs. the actual corresponding C source


CFunction["int", "value", {}, CBlock[{CReturn["valueDefine"]}]]

Source Code Representation:

#include "stdlib.h"
#include "constants.h"
int value()
return valueDefine;