The Deep Learning Components provide common deep learning layers typically found in the hidden layers of deep neural networks.
Many of the Deep Learning Components have the same hyperparameters. Below are the definitions for common hyperparameters found in multiple Components:
- Activation function: specifies the type of activation function to use for the layer. See tf.keras.activations for definitions of specific activation functions.
- Batch normalization: specifies if batch normalization should be used. During training, the distribution of each layer's inputs can change as updates are made to values in previous layers. This phenomenon known as covariate shift occurs when models experience saturating nonlinearities in layer values. Batch normalization can address this by normalizing values of mini batches within the model, potentially allowing for higher learning rates and introducing regularization.
- Dropout: specifies if dropout should be used. This can help regularize deep neural networks to avoid overfitting, by randomly dropping out nodes during training.
- Include top: specifies whether to include the fully-connected layer at the top of the network.Pooling: optional pooling mode for feature extraction when Include top is false. Can be set to:
- None: the output of the model will be the 4D tensor output of the last convolutional block.
- avg: global average pooling will be applied to the output of the last convolutional block, and thus the output of the model will be a 2D tensor.
- max: global max pooling will be applied.
- Trainable: specifies whether the model should update its weights during training. For classification, it's common to have this setting set to false while for object detection it's common to have it set to true.
- Weights: specifies whether or not ImageNet weights should be used. When no weights are used, the weights must be provided in the Component's code using PerceptiLabs' Code Editor.
The following are common ways that the Deep Learning Components are connected to other Components via their input and output sockets. Note that these will vary depending on the use case and model configuration settings.
Any with a time dimension (i.e., ≥2d, ignoring the batch dimension)
2D Input, Merge, 2D Convolution, Grayscale, Reshape, Rescale (needs to be 224x224x3 if using weights)
Adds a dense layer (aka fully-connected layer) whereby all outputs from one layer are connected to all inputs of the next layer.
- Neurons: specifies how many neurons the layer is to comprise of. Each neuron will be connected to each neuron of both the input and output Components.
Adds a convolution layer which is the foundation for building a Convolution Neural Network (CNN). This layer performs a convolution operation on image data (ranging from one to three dimensions) whereby a kernel filter patch, which is smaller in size than the image, is passed over the image at some stride (i.e., x number of pixels at a time) to build a feature map. A CNN is commonly used in computer vision applications to detect features within images. This Component can also be configured to perform a deconvolution by setting the Convolution type parameter to Transpose.
- Convolution type: specifies the type of convolution to use:
- Conv: 2D convolution layer (e.g., spatial convolution over images). See this TensorFlow topic for more information.
- Transpose: Transposes a tensor. Select this option to perform a deconvolution. See this TensorFlow topic for more information.
- Separable: performs depthwise separable 2D convolution. See this TensorFlow topic for more information.
- Depthwise: performs the first step of depthwise separable 2D convolution. See this TensorFlow topic for more information.
- Dimension: specifies the dimension of the input image.
- Patch size: sets the size, in pixels, of the filter patch.
- Stride: sets the number of pixels to move the filter patch over the image.
- Feature maps: sets the number of feature maps to generate.
- Zero-padding: specifies how zero-padding should be used. Padding extends the area of the image to provide more area for the filter to cover the image, potentially leading to more accurate image analysis. Zero-padding can be set to:
- SAME: results in padding evenly to the left/right or up/down of the input such that the output has the same height/width dimension as the input.
- VALID: no padding is to be used.
- Pooling: specifies if a pooling layer should be included. Pooling downsamples a feature map so that changes to features (e.g., small movements of features) don't result in the creation of new feature maps, while retaining the feature.
Adds a recurrent layer that includes a looping capability such that its input consists of both the data to analyze as well as the output from a previous calculation performed by that layer. Recurrent layers form the basis of Recurrent Neural Network (RNNs), effectively providing them with memory (i.e., they can maintain a state across iterations), while their recursive nature makes RNNs useful for cases involving sequential data like natural language and time series. They're also useful for mapping inputs to outputs of different types and dimensions.
- Neurons: specifies how many neurons that the layer is to consist of.
- Recurrent alternative: specifies which recurrent alternative to use. For additional see Recurrent layers.
- Return Sequence: toggles whether all states should be returned. If set to No, only the last state is returned.
Creates a Keras Applications VGG16 model that can be used for transfer learning when dealing with large-scale images.
Creates a Keras Applications ResNet50 model that can be used for transfer learning. This can be a quicker method than creating a ResNet from scratch. A ResNet50 model is a deep convolutional neural network (CNN) with 50 layers and is commonly used for image classification problems.
Creates a Keras Applications InceptionV3 model that can be used for transfer learning. InceptionV3 is a convolutional neural network (CNN) for assisting in image analysis and object detection,
Creates a Keras Applications MobileNetV2 model that can be used for transfer learning. MobileNetV2 is useful for visual recognition including classification, object detection, and semantic segmentation.
- Attention: enables the UNet as an Attention U-net. Using attention is a way to strengthen relevant activations during training, which in turn makes the model generalize better and waste less computations on irrelevant activations.
- Attention Type: specifies whether the attention operation should be additive or multiplicative. Additive is the standard.
- Attention Activation: specifies which activation function the attention cell should use.
- N labels: the number of output channels the UNet should have, and therefore the number of classes the UNet attempts to segment. Standard for Binary is often 1 as we don't want the background to be its own class since that can be very imbalanced.
- Stack num down: the number of Convolutional layers there should be in succession for each level in the downsampling path.
- Stack num up: the number of Convolutional layers there should be in succession for each level in the upsampling path.
- Pool: specifies a pooling method to be used in the downsampling path.
- Unpool: specifies a method to smoothen/interpolate the upsampling in the upsampling path.
- Backbone: allows you to select different backbone models for transfer learning.