Data Wizard
Last updated
Last updated
The Data Wizard comprises a series of screens which help you to import your data and get up and running with a basic working neural network model in PerceptiLabs' Modeling Tool.
The data required by PerceptiLabs must consists of:
a .csv file that maps data (e.g., image files) to labels (see CSV File Format). Generally you will manually create this file, but you can also use some of the example files that we've made available (described in the workflow steps below).
(optional) additional raw data (e.g., .png image files) that PerceptiLabs will use as a data source.
This topic describes the various settings that you can configure in the Data Wizard.
For information on using the Data Wizard, see Loading and Pre-processing Data.
After loading your .csv file, the Data Wizard requires you to define the data so that PerceptiLabs can prepare to train on it:
Note: This screen is also available when you click Data Settings in the Modeling Tool. However, the Name and Model Path cannot be modified when accessed in this way.
The main elements of this screen are as follows:
Name (available when accessed via the Data Wizard): Uniquely identifies the model and becomes the name of the directory where the model.json file is stored (this is the file that stores your PerceptiLabs model).
Model Path (available when accessed via the Data Wizard): allows you to specify a unique name and path to store your model. PerceptiLabs will generate a sub directory in the model's location using the specified model name. The model will be saved to a model.json file within that directory every time you save the model. The base path defaults to ~/Users//Documents/PerceptiLabs/Default.
Dataset Column Definitions: allows you to configure each column.
Column Pre-processing Settings (available only for certain datatypes (e.g., images, numerical data, etc.)): displays a popup with options to specify how PerceptiLabs should pre-process the data before loading it. See Pre-processing Your Data for more information.
Dataset Column Examples: shows a small sample of the columns and data loaded from the .csv file so you can visualize what the .csv file looks like.
Input/Target: specifies whether the CSV column shown directly above this field represents input or target (classification) data. In the screenshot above, the images column is defined as Input and the Labels column as Target. To ignore the column set it to Do not use.
Data Type: specifies the type of data represented in the column directly above this field. In the screenshot above, the images column is configured as representing image data, and the labels column is configured as representing categorial (i.e., classification) data. The currently available data types are:
Categorical: strings or numbers; they are automatically converted into numbers and OneHot encoded.
Image: loaded as a path to image data; the supported file types are: .jpg, .jpeg .png, .tif and .tiff.
Text: string data.
Numerical: numerical data.
Data Partition: partitions the data into three sets:
Training: core training data on which to train the model.
Validation (aka verification data): data used to test model fit during training.
Test: data to test the model against after training, to see how well the trained model handles data it hasn't seen before.
Randomize partition: when enabled, constructs the partitions using a random order of data samples.
Seed: seed used to randomize the partition.
Reload dataset: returns to the .csv file selection screen.
After completing this configuration, click Create on the bottom right-hand corner to generate a working model.
The pre-processing settings tell PerceptiLabs how to pre-process your data before importing it in your model. For example, you can use it to normalize data, resize images, etc.
Depending on the type of data, some or all of the following may be available:
Normalize: lets you choose what method to use to normalize your data, for bringing it into a specific range of values. This is useful for most cases as long as the value itself is not of importance.
Random Flip: doubles the size of the dataset and randomly selects specific images to flip.
Resize: resizes the image. Set to Custom to specify the width and height in pixels to resize each image. Set to Automatic and select one of the following options:
Dataset mode: determine the mode image size and resize all images to that size.
Dataset mean: determine the average image size and resize all images to that size.
Dataset max: determine the largest image size and resize all images to that size.
Dataset min: determine the smallest image size and resize all images to that size.
Random Rotation (based on RandomRotation layer): randomly rotate some of the images using one of the following methods. Note that Random Rotation also doubles the size of the dataset.
Reflect (d c b a | a b c d | d c b a): the input is extended by reflecting about the edge of the last pixel.
Constant (k k k k | a b c d | k k k k): the input is extended by filling all values beyond the edge with the same constant value k = 0.
Wrap (a b c d | a b c d | a b c d): the input is extended by wrapping around to the opposite edge.
Nearest (a a a a | a b c d | d d d d): the input is extended by the nearest pixel.
The Factor can be set to between 0 and 2pi to specify the maximum. It also randomly rotates down to the negative version of that value. Set Seed to seed the randomness.
Random Crop: randomly crop some of the images to the specified size. This also doubles the size of the dataset
Click Save (6) to save your settings.
After you've defined your dataset, you then specify the initial training settings to use. See Model Training Settings for a description of each field.