Computer Vision is a key technology for building algorithms to enable self-driving cars. We've used PerceptiLabs to recreate Nvidia's end-to-end deep learning approach for mapping raw pixels using images captured from front-facing cameras mounted on a car. Each image has a corresponding steering angle associated with it that tells the position of the car's steering for that frame. For this model, we have used Udacity’s car simulator to collect the dataset:
Figure 1: Udacity's Car Simulator
The car captures three pictures – left, center, right – for every single frame using the cameras that are fitted on the front of the car:
Figure 2: Example Images From the Three Cameras Mounted to the Front of the Car.
Each frame has its own steering angle value that will be used as labels.
The model is based around the PilotNet model which is composed of nine layers:
Five Convolutional Layers. These layers, which form a Convolutional neural network (CNN), play a big part in computer vision, namely in the training of features using images as input.
Three Dense layers.
An output layer (implemented as a fully-connected Dense component in PerceptiLabs).
The PilotNet model that we recreated in PerceptiLabs, along with sample data, is available for you to try on GitHub. The final model looks as follows:
Figure 4: Screenshot of the PilotNet Model in PerceptiLabs.
The data was preprocessed by normalizing it with Google Colab which involved dividing the entire image matrix by 255 to bring all the values on the same scale (i.e., 0 to 1). We've also made some code modifications in our model's components as described in the README.md file on GitHub.
You can also watch how to build and train this model in the following video: