How to Convert Your Keras Model to ONNX
Intuition
I love Keras for its simplicity. With about 10 minutes, I can build a deep learning model with its sequential or functional API with elegant code. However, Keras always loads its model in a very slow speed. Moreover, I have to use another deep learning framework because of system constrains or solely my boss tells me to use that framework. Though I can convert my Keras model to other frameworks with other guy’s script, I still convert my model to ONNX for trying its claimed interoprability in the AI tools.
What is ONNX?
ONNX is an abbreviation of “Open Neural Network Exchange”. The goal of ONNX is to become an open format to represent deep learning models so that we can move model between frameworks in ease, and it is created by Facebook and Microsoft.
Converting Your Keras Model to ONNX
- Download the example code from my GitHub
- Download the pre-trained weight from here
- Type the following commands to set up
$ conda create -n keras2onnx-example python=3.6 pip $ conda activate keras2onnx-example $ pip install -r requirements.txt
- Run this command to convert the pre-trained Keras model to ONNX
$ python convert_keras_to_onnx.py
- Run this command to inference with ONNX runtime
$ python main.py 3_001_0.bmp
It should output the following messages in the end:
...
3_001_0.bmp
1.000 3
0.000 37
0.000 42
0.000 14
0.000 17
Some Explanations
convert_keras_to_onnx.py
converts a Keras .h5
model to ONNX format, i.e., .onnx
.
The code of it is shown below:
from tensorflow.python.keras import backend as K
from tensorflow.python.keras.models import load_model
import onnx
import keras2onnx
onnx_model_name = 'fish-resnet50.onnx'
model = load_model('model-resnet50-final.h5')
onnx_model = keras2onnx.convert_keras(model, model.name)
onnx.save_model(onnx_model, onnx_model_name)
There are some points for converting Keras model to ONNX:
- Remember to import
onnx
andkeras2onnx
packages. keras2onnx.convert_keras()
function converts the keras model to ONNX object.onnx.save_model()
function is to save the ONNX object into.onnx
file.
main.py
inferences fish image using ONNX model.
And I paste the code in here:
from tensorflow.python.keras import backend as K
from tensorflow.python.keras.models import load_model
from tensorflow.python.keras.preprocessing import image
import sys
import numpy as np
import keras2onnx
import onnxruntime
IMAGE_SIZE = 224
img_path = sys.argv[1]
#net = load_model('model-resnet50-final.h5')
#onnx_net = keras2onnx.convert_keras(net, net.name)
onnx_model = 'fish-resnet50.onnx'
# 41 + 1 classes
cls_list = ['10', '11', '12', '13', '14', '15', '16', '17', '18', '19',
'1', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29',
'2', '30', '31', '32', '33', '34', '35', '36', '37', '38', '39',
'3', '40', '41', '42', '4', '5', '6', '7', '8', '9']
# image preprocessing
img = image.load_img(img_path, target_size=(IMAGE_SIZE, IMAGE_SIZE))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
# runtime prediction
#content = onnx_net.SerializeToString()
#sess = onnxruntime.InferenceSession(content)
sess = onnxruntime.InferenceSession(onnx_model)
x = x if isinstance(x, list) else [x]
feed = dict([(input.name, x[n]) for n, input in enumerate(sess.get_inputs())])
pred_onnx = sess.run(None, feed)[0]
pred = np.squeeze(pred_onnx)
top_inds = pred.argsort()[::-1][:5]
print(img_path)
for i in top_inds:
print(' {:.3f} {}'.format(pred[i], cls_list[i]))
and there are some outlines when inferencing:
- Remember import
onnxruntime
package. onnxruntime.InferenceSession()
function loads ONNX model.run()
function in line 34 predicts the image and return the predicted result. What’s more,sess.run(None, feed)[0]
returns the 0-th elemnent as a numpy matrix.np.squeeze(pred_onnx)
squeezes the numpy matrix to numpy vector, i.e., remove the 0-th dimension, so that we can get the probabilities of each class.
Inference Time
Total Inference Time (load model + inference)
Some reports say ONNX run faster both mode loading time and inferencing time. Hence, I make an inference time experiment on my laptop in this section.
The hardware for this experiment is shown below:
- CPU: Core i5-3230M
- RAM: 16GB
The software and packages are shown here:
- OS: CentOS 7.6
- Programming language: Python 3.6
- Packages
- keras version 2.2.4
- tensorflow version 1.13.1
- onnxruntime
Each inferencing runs three times to eliminate the erro of other factors, e.g., context switching.
For inferencing with Keras, my computer runs with the following results:
$ time python resnet50_predict.py 3_001_0.bmp
# run for three times
...
real 0m37.801s
user 0m37.254s
sys 0m1.590s
...
real 0m35.558s
user 0m35.838s
sys 0m1.362s
...
real 0m36.444s
user 0m36.542s
sys 0m1.418s
The inferencing time is about (37.081+35.58+36.444)/3 = 36.37 seconds (round to the second digit after decimal point).
As a contrary, inferencing with ONNX runtime shows:
$ time python main.py
# run three times
...
real 0m2.576s
user 0m2.919s
sys 0m0.759s
...
real 0m2.530s
user 0m2.931s
sys 0m0.700s
...
real 0m2.560s
user 0m2.944s
sys 0m0.710s
Whoa! What a huge improvement! The inferencing tims is about (2.576+2.530+2.560)/3 = 2.56 seconds.
The code infercing with Keras can be found on my GitHub repo.
Inference Time (only inference)
Edit: One of my friend said I should test only inference time between Keras and ONNX because we load model once in practice. As a result, I will test only inference time between Keras and ONNX, and I will split to two parts:
- Keras (with TensorFlow installed by
pip
) v.s. ONNX- Keras (with TensorFlow installed by
conda
) v.s. ONNX
Of course, I write comparison.py
in order to make comparison test which is
shown below:
from tensorflow.python.keras import backend as K
from tensorflow.python.keras.models import load_model
from tensorflow.python.keras.preprocessing import image
import sys
import numpy as np
import onnxruntime
import time
IMAGE_SIZE = 224
loop_count = 10
img_path = sys.argv[1]
# load Keras and ONNX model
net = load_model('model-resnet50-final.h5')
onnx_model = 'fish-resnet50.onnx'
sess = onnxruntime.InferenceSession(onnx_model)
# 41 + 1 classes
cls_list = ['10', '11', '12', '13', '14', '15', '16', '17', '18', '19',
'1', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29',
'2', '30', '31', '32', '33', '34', '35', '36', '37', '38', '39',
'3', '40', '41', '42', '4', '5', '6', '7', '8', '9']
# image preprocessing
img = image.load_img(img_path, target_size=(IMAGE_SIZE, IMAGE_SIZE))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
# Keras predtion
start_time = time.time()
for i in range(loop_count):
pred = net.predict(x)[0]
print("Keras inferences with %s second in average" %((time.time() - start_time) / loop_count))
# ONNX prediction
x = x if isinstance(x, list) else [x]
feed = dict([(input.name, x[n]) for n, input in enumerate(sess.get_inputs())])
start_time = time.time()
for i in range(loop_count):
pred_onnx = sess.run(None, feed)[0]
print("ONNX inferences with %s second in average" %((time.time() - start_time) / loop_count))
You can see I run each inference method with 10 times and take the average time,
and I run comparison.py
three times for easing the error.
Keras (with TensorFlow installed by pip
) v.s. ONNX
The comparison is shown below:
$ python comparison.py
...
Keras inferences with 0.8759469270706177 second in average
ONNX inferences with 0.3100883007049561 second in average
...
Keras inferences with 0.8891681671142578 second in average
ONNX inferences with 0.313812255859375 second in average
...
Keras inferences with 0.9052883148193359 second in average
ONNX inferences with 0.3306725025177002 second in average
We find that Keras inference needs (0.88+0.87+0.91)/3 = 0.87 seconds, while ONNX inference need (0.31+0.31+0.33)/3 = 0.32 seconds. That’s a speedup ratio of 0.87/0.32 = 2.72x between ONNX and Keras.
Keras (with TensorFlow installed by conda
) v.s. ONNX
Wait a minute! pip install tensorflow
installs TensorFlow without optimization
of Intel’s processor. So let’s remove TensorFlow first then install it via conda
(the version I install is 1.13.1
).
Then run comparison.py
again:
$ python comparison.py
...
Keras inferences with 0.9810404300689697 second in average
ONNX inferences with 0.604683232307434 second in average
...
Keras inferences with 0.8862279415130615 second in average
ONNX inferences with 0.6059059381484986 second in average
...
Keras inferences with 0.9496192932128906 second in average
ONNX inferences with 0.5927849292755127 second in average
We find Keras takes (0.98+0.89+0.95)/3 = 0.94 seconds to inference. Compared to ONNX,
it spend (0.60+0.61+0.59)/3 = 0.6 seconds for inferencing. That’s a speedup of 0.94/0.6
= 1.57x. Interestingly, both Keras and ONNX become slower after install TensorFlow via
conda
.
Conclusion
In this post, I make an introduction of ONNX and show how to convert your Keras model to ONNX model. I also demonstrate how to make prediction with ONNX model. Hope you enjoy this post!