Wolfram Computation Meets Knowledge

Wolfram Summer School

Alumni

Rohan Saxena

Technology and Innovation

Class of 2017

Bio

Rohan Saxena is an sophomore majoring in computer science at BITS Pilani, India. His academic interests, past projects and work in college and beyond lie in the fields of artificial intelligence, robotics and machine learning, especially deep learning.

He was fascinated by autonomous robots in his freshman year at the robotics lab in college. Even today, he can be found there playing around with one of the bots.

Passionate about self-driving cars, Rohan wants to make the technology robust so as to bring these machines to developing countries like India to make travel safer and cheaper.

In his free time, Rohan enjoys reading, blogging and swimming.

Computational Essay

Artificial Neuron »

Project: Semantic Segmentation of Urban Street Scenes

Goal of the project:

The ability to correctly analyze the ambient environment is critical for the development of autonomous systems such as self-driving cars. The aim of my project is to perform a pixel-wise semantic segmentation of the images contained in the Cityscapes dataset. Cityscapes is a challenging dataset consisting of a comprehensive set of images from various kinds of urban settings.

Summary of work:

I chose to implement a deep neural network using the pyramid pooling architecture described in “Pyramid Scene Parsing Network.” I used the pre-trained ResNet network padded with a custom pyramid pooling scheme to effectively extract features from the images. My network outputs a set of probabilities for each pixel in the image that indicate the possibility of that pixel belonging to one of 34 classes.

Results and future work:

After a day of training on the NVIDIA Tesla K80 GPU, my network shows performance between the coarse and fine annotations of the Cityscapes dataset. This result can be improved by training for a longer time or modifying the architecture of the neural network. The last layer of the pyramid network upsamples the pixels by a zoom factor of 8. I believe the performance can be increased by replacing this drastic upsampling by a series of deconvolution layers. This will, however, come at the cost of more parameters.