PerceptiLabs
Search…
Using the Exported/Deployed Model
This topic provides information about the different formats to which you can export/deploy your models:

TensorFlow

When exporting a TensorFlow model to TensorFlow's SavedModel format (i.e., TensorFlow Model was selected with no Compress options), a directory will be created in the location that you specified in the Save to field, with the same name as your model. The structure of the directory will look as follows:
1
/mymodel
2
/assets
3
/variables
4
variables.data-00000-of-00001
5
variables.index
6
keras_metadata.pb
7
saved_model.pb
Copied!
To use the exported model for inference, take the entire directory and use it for serving the model. This structure is standard for TensorFlow serving. The variables subdirectory contains a standard training checkpoint, this is needed to load the model unless it's frozen. A frozen model (or frozen graph) is a minimized model that can only be used for inference. All the variables needed for training are removed and the only variables that remain are stored together with their definitions in a single protobuf (.pb) file. Note that TensorFlow 2.0 no longer generates frozen graph models.
The exported model also includes all of the pre-processing options you specified in the Data Wizard. This means that you can pass raw, non-preprocessed data to the exported model for inference, and the model will pre-process the data for you, in the same manner as when you built and trained the model in PerceptiLabs.
For additional information see Serving a TensorFlow Model.

TensorFlow Lite

If you selected TensorFlow Model and either of the Compress options, then the model is output to TensorFlow Lite format and the directory structure for the exported model will look as follows:
1
/mymodel
2
quantized_model.tflite
Copied!
For additional information see TensorFlow Lite inference.

FastAPI Server

Selecting FastAPI Server exports both a standard TensorFlow model, a Python server app that exposes a simple REST API for running inference on that model, and an example Python client app that invokes that API. The server runs locally on Port 8181 and exposes an endpoint called /predict that returns a classification for a given piece of data.

Output Structure

The structure of the directory will look as follows:
1
/mymodel
2
/assets
3
/variables
4
variables.data-00000-of-00001
5
variables.index
6
example.csv
7
example.json
8
fastapi_example_requirements.txt
9
fastapi_example.py
10
fastapi_requirements.txt
11
fastapi_server.py
12
keras_metadata.pb
13
saved_model.pb
Copied!
The example.json file contains the data (e.g., image data) and the example.csv file contains a copy of the data/label mappings from the Data Settings. These are used in the Python code as described below.
The output also includes the standard TensorFlow files as described in the TensorFlow section above, as well as Python source code (.py) and requirement (.txt) files for the app:
  • fastapiserver.py: shows how to set up a simple FastAPI server that hosts a model and exposes a predict endpoint to run inference. The app loads data from example.json and uses it to display example data as part of autogenerated API documentation when the server runs. More detail is provided below.
  • fastapiexample.py: shows a simple app that sends a request to the server's predict endpoint for each row listed in example.csv.
For additional information see the FastAPI website.

Using the Example apps

The server example app (fastapiserver.py) runs a local webserver that exposes the /predict endpoint and displays auto-generated REST API documentation in your browser. The client example app (fastapiexample.py) sends a request to the /predict endpoint running locally on your machine.
Tip: The example server and client have different requirements (i.e., Python dependencies) so it's best practice to run them in their own virtual environments.
Follow the steps below to use these example apps:
1) Navigate via the command line, into the Fast API directory that PerceptiLabs exported to your local machine.
2) (Optional) Create a virtual environment for the server.
3) Run the following command to ensure that the server's dependencies (e.g., FastAPI, Pillow, and Uvicorn) are installed:
  • pip install -r fastapi_requirements.txt
4) Run the server:
  • python fastapi_server.py
The output will look similar to the following:
HINT: go to http://localhost:8181/docs to interact with the API. Further docs are available at http://localhost:8181/redoc 2021-10-29 09:48:35.465309: I tensorflow/core/platform/cpu_feature_guard.cc:145] This TensorFlow binary is optimized with Intel(R) MKL-DNN to use the following CPU instructions in performance critical operations: SSE4.1 SSE4.2 AVX AVX2 FMA To enable them in non-MKL-DNN operations, rebuild TensorFlow with the appropriate compiler flags. 2021-10-29 09:48:35.466410: I tensorflow/core/common_runtime/process_util.cc:115] Creating new thread pool with default inter op setting: 8. Tune using inter_op_parallelism_threads for best performance. INFO: Started server process [8455] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:8181 (Press CTRL+C to quit)
5) Open a web browser and navigate to http://localhost:8181/docs to see the auto-generated documentation for the server. It will look similar to the following:
Auto-generated REST documentation displayed by the server app
6) Open a new command-line window, open or create a new virtual environment if you're using virtual environments, and navigate into the Fast API directory that PerceptiLabs exported to your local machine.
7) Run the example client app:
  • python fastapi_example.py
The app will send a request to the server's /predict endpoint and the output will look similar to the following:
1
{'labels': ['yes', 'yes', 'yes', 'no', 'yes', 'no', 'yes', 'yes', 'no', 'no']}
Copied!

Gradio

Deploying as Gradio starts a local Gradio server and then opens Gradio as an app in a new tab in your web browser:
Example of a Gradio app deployed by PerceptiLabs.
Note: You may need to enable popups in your web browser in order for the Gradio app to open in a new tab.
In this Gradio browser app, it's very easy to test and showcase the inference capabilities of your model. You can easily load in different data and see how the model reacts to it, making it ideal to let other people try it out in addition to your own experimenting. The Gradio app automatically customizes itself to the kind of model you are using, so that the input and outputs you see are always relevant.
For more information check out the Gradio website. We also encourage you to share screenshots of your results on the PerceptiLabs forums if you build something cool!
Last modified 15d ago