Convolution Tutorial
Last updated
Last updated
Sign language has been used for centuries to help the hearing impaired to communicate. It's composed of various hand gestures and even body movements to represent the information to convey. In this tutorial, we are going to train a machine learning model to identify pictures of hand signs representing the digits 0 to 9. You can also watch this model being put together in the following video:
The model follows the LeNet approach and consists of two Convolution (ConV) layers followed by a Dense layer. While both of the ConV layers use Relu activation functions, the Dense layer uses a Softmax function and is passed through to a Classification training component. The model also makes use of pooling which is configured via the ConV components.
The architecture of this model is illustrated in Figure 1:
A ConV neural network is a deep learning algorithm that has the ability to learn from an input image through a number of filters and an activation function for that layer. In other words, when a user loads an image, the ConV layers try to learn from different features in that image. These features may be vertical edges, horizontal edges, different colors in the image, lines, etc.
After convolution is performed, the image is subsequently fed to a Pooling layer. The goal of pooling is to downsample the feature maps (i.e., the outputs) from the previous step and to summarize the results before feeding them to the next layer. There are two types of pooling methods: average and max pooling. For this tutorial, we will use max pooling in which a feature map is divided into a grid of values. The feature map is then partitioned into regions of values from which the maximum value is chosen from each. For example, in Figure 2 below, a feature map composed of a grid of values is divided into four color-coded regions, and the maximum values are chosen from each:
The ConV and Pooling layers are then combined to make up one full layer. In PerceptiLabs, you will define both the ConV and Pooling layers via Convolutioncomponents.
We've provided data in this GitHub repo so that you can use it for this tutorial. The data is contained in a file named data.zipand consists of two files:
X.npy: contains 64x64 grayscale images of hand signs, where each pixel is represented as a normalized grayscale value from 0.0 through 1.0.
Y.npy: contains the respective labels which correspond to the hand sign images.
The repo also contains the PerceptiLabs model.jsonfile that you can load – you just need to assign X.npy and Y.npy to the model's first and second Data components respectively.
The tutorial below describes how to create this model from scratch.
This section provides the steps for the tutorial in which you will build a model from scratch in PerceptiLabs for classifying images of sign language hand movements. The completed model will look as follows in PerceptiLabs:
Note
This tutorial assumes you have a basic familiarity with using PerceptiLabs. If you have not already done so, we recommend that you first follow the Basic Image Recognition tutorial to become familiar with building models in Perceptilabs.
Ensure you have unzipped the data from the sample GitHub repo.
Create a new Empty model in PerceptiLabs.
Drag and drop two Data components into your model.
Select the first Data component to display its settings in the Settingspane and assign the X.npy file from the sample repo to it.
Repeat the previous step to add a second Data component but assign Y.npy to it from the sample repo.
The image data loaded from X.npy above needs to be converted from two-dimensional arrays of 64x64 values into two-dimensional arrays of 64x64x1 elements for use by subsequent components that will be added to the model.
Note
The "1" in the array size of 64x64x1 refers to the number of color channels. Using 1 specifies normalized grayscale values. Using 3 would specify RGB values.
Follow the steps below to perform this conversion:
Click Processing in the toolbar and drag and drop a Reshape component onto the workspace area to the right of the first Data component containing the image data.
Click the first Data component in your project, and drag a connector from that component's output field to the Reshape component's input field. This causes the output of the Data component to be input into the Reshape component.
You're now ready to add ConV layers for extracting features. Follow the steps below to add the Convolutioncomponents to your model:
Click Deep Learning and drag and drop a Convolution component onto your model, to the right of the Reshape component.
Drag a connector from the Reshape component's output to your Convolution component's input.
Select the Convolutioncomponent in your project to view its settings in the Settings pane on the right.
Enter the following hyperparameters for the Convolution component:
Stride: 1
Patch size: 3
Feature maps: 8
Pooling: Yes
Pooling area: 2
Pooling stride: 2
Add a second Convolution component to your model.
Drag a connection between the first Convolution component's output to the second Convolution component's input.
Select the second Convolution component in your project to view its settings in the Settings pane on the right.
Enter the following hyperparameters for the second Convolution component:
Stride: 1
Patch size: 3
Feature maps: 16
Pooling: Yes
Pooling area: 2
Pooling stride: 2
You're now ready to add a Dense layer to the model. This component takes all of the outputs from the previous component and connects them to a number of new outputs which in this case will be used as the prediction/output of the model.
Follow the steps below to add and configure this layer:
Click Deep Learningand drag and drop a Dense component onto your model, to the right of the second Convolution component.
Drag a connector from the second Convolution component's output to the Dense component's input.
Select the Dense component in your model to view its settings in the Settings pane on the right.
Enter the following hyperparameters for the Dense component:
Neurons: 10
Activation Function: Softmax
The model is almost complete and just needs a training component to update and train the model, and to correlate the images with the labels.
Follow the steps below to complete the model:
Click Training and drag and drop a Classification component onto your model, to the right of all of the other components. The Classification training component is specialized to train classification models.
Drag a connector from the Dense component's output to the Classification component's prediction input.
Drag a connector from the second Data component's output to the Classification component's labels input.
Verify that there are no errors or warnings in the model. For more information see Debugging Models.
Train the model and (optionally) export it.
For more information and to ask questions, be sure to check out our forums.