Types of Tests
Last updated
Last updated
Depending on the type(s) of models and the options selected in the test configuration, the following test results may be available:
The Confusion Matrix lets you see, at a glance, how predicted classifications compare to actual classifications, on a per-class level:
The colors correspond to the number of samples that are classified as one thing or another. For example, if you have many samples for one class and few for another, then you have an unbalanced test dataset. If the samples are not on the diagonal, then they are falsely classified as another class, making it easy to see if any one class has better classifications than any other.
The Classification Metrics pane (aka Label Metrics Table) displays information for classification models:
Model Name: the name of the model for which the metrics apply. Multiple models will be listed if tests were performed on more than one model.
Categorical Accuracy: accuracy for each category averaged over all of them.
Top K Categorical Accuracy: frequency of the correct category among the top K predicted categories.
Precision: accuracy of positive predictions.
Recall: percentage of positives found (i.e., not misclassified as negatives instead of positives).
Tip: Precision and Recall are often more telling than the accuracy, especially for unbalanced datasets, and generally you want both to be as high as possible. They help you find as many classes as possible that are correctly classified as often as possible.
The Segmentation Metrics pane displays information for image segmentation models:
Model Name: the name of the model for which the metrics apply. Multiple models will be listed if tests were performed on more than one model.
Intersection over Union: a great method to assess the model's accuracy. It goes beyond pixel accuracy (which can be unbalanced due to having more background than object-level pixels) by comparing how much the objects in the output overlap those in ground truth. For more information see Intersection over Union (IoU) for object detection.
Dice coefficient: measures the overlap between the predicted and ground truth samples, where a result of 1 represents a perfect overlap. Useful for image segmentation. For more information see Sørensen–Dice coefficient.
The Output Visualization pane displays visualizations of the input data, and final transformed target data:
You can hover the mouse over this pane to display < > buttons and a scrollbar, for navigating through each data sample.
If you want to learn more about what to watch out for during training, see Common Testing Issues.