WOLFRAM

Wolfram Summer School

Alumni

Jorge Mahecha

Educational Innovation

Class of 2016

Bio

Jorge has worked in the education sector for 19 years. During this time, he has served in several leadership, research, teaching and consultancy positions. He holds a master’s in curriculum and teaching from Boston University and a master’s in educational administration from Los Andes University in Bogotá, Colombia, his native city. Jorge’s professional interests include the professionalization of the school-teaching profession, methodologies for social program evaluation and the use of educational technology to understand statistical concepts. A board member and cofounder of Teach for Colombia, Jorge is now a PhD student at the Educational Research, Measurement, and Evaluation program at the Lynch School of Education in Boston College.

As a side note, he found his first job in research by playing around with Mathematica 3.0. He was doing some Fourier transformations of genetic sequences as a side project to his job at the time. He was able to present that work to the director of the Chaos and Complex Systems group at Los Andes University in Bogotá, and he was hired to work with him in analyzing the information content of DNA sequences.

Project: Learning Statistics with Mathematica: A Basic Tutorial for Social Scientists/Educational Researchers

Despite Mathematica’s powerful features for data analysis, it remains—at least in my experience—seldom used in social science research, particularly in fields like education, psychology and social work. Platforms like R, Stata and SPSS are the traditional tools of choice for research in such areas. This tradition is reinforced by the lack of examples in the available Mathematica documentation regarding how to handle and analyze the types of datasets commonly used by researchers in these fields of social science. For example, although there is documentation on the commonly used ANOVA technique for comparing the means of different sets of variables, the available example uses manually entered lists of data of short length. Although this shows that indeed Mathematica can be used to do ANOVAs, this example is somehow artificial in the sense that educational researchers would seldom, if ever, perform an ANOVA on such a small set. The kind of datasets frequently used in educational and psychological research are arrays of data in which nominal, ordinal and continuous variables are recorded for a set of individuals. In Mathematica these structures are represented as datasets. The aim of my project is to provide examples of how to import, manipulate and analyze these kinds of datasets using Mathematica. By doing this, this project adds to the existing reference/support materials for the software in the social sciences, presenting Mathematica as a flexible and powerful tool for data analysis. As a large number of higher education institutions make Mathematica available to their students and faculty, efforts like the present notebook could be important in allowing students and faculty to realize that they have at their disposal an incredibly powerful tool for data analysis that is currently underused.

Goals of the project:
The main goal of this project is to present a set of examples of Mathematica data analysis and manipulation capabilities for datasets like those frequently used in the social sciences that involve records for relatively large numbers of individuals. These records can include different types of qualitative and quantitative information. As accessory goals to this main objective, there are other specific goals to be achieved in the development of this project, related to showing:

  1. How datasets should be imported so they are properly read in different platforms (like OS X and Windows)
  2. How to obtain descriptive statistics parameters and graphics from a dataset
  3. How to perform common statistical procedures like t-tests, ANOVAS or different types of regressions
  4. How to analyze dimensionality in survey data

Datasets:
Two datasets would be used in this notebook. One is a relatively small dataset that will be used for demonstration purposes for some procedures. The other is a larger dataset retrieved form the web, containing actual survey data.