I am running the yolov4 object detection model on raspberry pi 4B and jetson nano. I have to record the inference time. I am using 12 images for evaluation. how can I record the inference time of this model? Is there anything for calculating the inference time?
Hi #MohammadSaeidAnwar you can use the time module at the starting of your inference before you start your inference to calculate the time.
For example
import time
begin = time.time()
time.sleep(5)
end = time.time()
print('Time taken:', round(end-begin, 2))
#output:Time taken: 5.01
Thankyou.
Related
TLDR:
Short term: Trying to quantize a specific portion of a TF model (recreated from a TFLite model). Skip to pictures below. \
Long term: Transfer Learn on Yamnet and compile for Edge TPU.
Source code to follow along is here
I've been trying to transfer learn on Yamnet and compile for a Coral Edge TPU for a few weeks now.
Started here, but quickly realized that model wouldn't quantize and compile for the Edge TPU because of the dynamic input and out of the box TFLite quantization doesn't work well with the preprocessing of audio before Yamnet's MobileNet.
After tinkering and learning for a few weeks, I found a Yamnet model compiled for the Edge TPU (sadly without source code) and figured my best shot would be to try to recreate it in TF, then quantize, then compile to TFLite, then compile for the edge TPU. I'll also have to figure out how to set the weights - not sure if I have to/can do that pre or post quantization. Anyway, I've effectively recreated the model, but am having a hard time quantizing without a bunch of wacky behavior.
The model currently looks like this:
I want it to look like this:
For quantizing, I tried:
TFLite Model Optimization which puts tfl.quantize ops all over the place and fails to compile for the Edge TPU.
Quantization Aware Training which throws some annoying errors that I've been trying to work through.
If you know a better way to achieve the long term goal than what I proposed, please (please please please) share! Otherwise, help on specific quant ops would be great! Also, reach out for clarity
I've ran into your same issues trying to convert the Yamnet model by tensorflow into full integers in order to compile it for Coral edgetpu and I think I've found a workaround for that.
I've been trying to stick to the tutorials posted in the section tflite-model-maker and finding a solution within this API because, for experience, I found it to be a very powerful tool.
If your goal is to build a model which is fully compiled for the edgetpu (meaning all layers, including input and output ones, being converted to int8 type) I'm afraid this solution won't fit for you. But since you posted you're trying to obtain a custom model with the same structure of:
Yamnet model compiled for the Edge TPU
then I think this workaround would help you.
When you train your custom model following the basic tutorial it is possible to export the custom model both in .tflite format
model.export(models_path, tflite_filename='my_birds_model.tflite')
and full tensorflow model:
model.export(models_path, export_format=[mm.ExportFormat.SAVED_MODEL, mm.ExportFormat.LABEL])
Then it is possible to convert the full tensorflow saved model to tflite format by using the following script:
import tensorflow as tf
import numpy as np
import glob
from scipy.io import wavfile
dataset_path = '/path/to/DATASET/testing/*/*.wav'
representative_data = []
saved_model_path = './saved_model'
samples = glob.glob(dataset_path)
input_size = 15600 #Yamnet model's input size
def representative_data_gen():
for input_value in samples:
sample_rate, audio_data = wavfile.read(input_value, 'rb')
audio_data = np.array(audio_data)
splitted_audio_data = tf.signal.frame(audio_data, input_size, input_size, pad_end=True, pad_value=0) / tf.int16.max #normalization in [-1,+1] range
yield [np.float32(splitted_audio_data[0])]
tf.compat.v1.enable_eager_execution()
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_path)
converter.experimental_new_converter = True #if you're using tensorflow<=2.2
converter.optimizations = [tf.lite.Optimize.DEFAULT]
#converter.inference_input_type = tf.uint8 # or tf.uint8
#converter.inference_output_type = tf.uint8 # or tf.uint8
converter.representative_dataset = representative_data_gen
tflite_model = converter.convert()
open(saved_model_path + "converted_model.tflite", "wb").write(tflite_model)
As you can see, the lines which tell the converter to change input/output type are commented. This is because Yamnet model expects in input normalized values of audio sample in the range [-1,+1] and the numerical representation must be float32 type. In fact the compiled model of Yamnet you posted uses the same dtype for input and output layers (float32).
That being said you will end up with a tflite model converted from the full tensorflow model produced by tflite-model-maker. The script will end with the following line:
fully_quantize: 0, inference_type: 6, input_inference_type: 0, output_inference_type: 0
and the inference_type: 6 tells you the inference operations are suitable for being compiled to coral edgetpu.
The last step is to compile the model. If you compile the model with the standard edgetpu_compiler command line :
edgetpu_compiler -s converted_model.tflite
the final model would have only 4 operations which run on the EdgeTPU:
Number of operations that will run on Edge TPU: 4
Number of operations that will run on CPU: 53
You have to add the optional flag -a which enables multiple subgraphs (it is in experimental stage though)
edgetpu_compiler -sa converted_model.tflite
After this you will have:
Number of operations that will run on Edge TPU: 44
Number of operations that will run on CPU: 13
And most of the model operations will be mapped to edgetpu, namely:
Operator Count Status
MUL 1 Mapped to Edge TPU
DEQUANTIZE 4 Operation is working on an unsupported data type
SOFTMAX 1 Mapped to Edge TPU
GATHER 2 Operation not supported
COMPLEX_ABS 1 Operation is working on an unsupported data type
FULLY_CONNECTED 3 Mapped to Edge TPU
LOG 1 Operation is working on an unsupported data type
CONV_2D 14 Mapped to Edge TPU
RFFT2D 1 Operation is working on an unsupported data type
LOGISTIC 1 Mapped to Edge TPU
QUANTIZE 3 Operation is otherwise supported, but not mapped due to some unspecified limitation
DEPTHWISE_CONV_2D 13 Mapped to Edge TPU
MEAN 1 Mapped to Edge TPU
STRIDED_SLICE 2 Mapped to Edge TPU
PAD 2 Mapped to Edge TPU
RESHAPE 1 Operation is working on an unsupported data type
RESHAPE 6 Mapped to Edge TPU
I trained a model using tf==2.5.0.
Advised to use model(x) instead of model.predict(x) speeds up the inference time. As I understand, model.predict(x) is used for predicting the tensors in batches. For example, model.predict(x) takes 1 second, whereas model(x) takes .02 seconds for executing a single image for detection. When I convert tf model to tfjs, the prediction takes nearly 1 second.
Checking TFJS API, it has only two functionalities for prediction namely
predict (x, args?)
predictOnBatch (x)
There is no luck, both consume same amount of time (1 second on average).
On the other hand, when the same model is converted to tflite, there is no reduction in time.
Environment:
Python3.7
Tensorflow: 2.5.0
Any suggestion is appreciated.
my situation is that saving model is extremely slow under Colab TPU environment.
I first encountered this issue when using checkpoint callback, which causes the training stuck at the end of the 1st epoch.
Then, I tried taking out callback and just save the model using model.save_weights(), but nothing has changed. By using Colab terminal, I found that the saving speed is about ~100k for 5 minutes.
The version of Tensorflow = 2.3
My code of model fitting is here:
with tpu_strategy.scope(): # creating the model in the TPUStrategy scope means we will train the model on the TPU
Baseline = create_model()
checkpoint = keras.callbacks.ModelCheckpoint('baseline_{epoch:03d}.h5',
save_weights_only=True, save_freq="epoch")
hist = model.fit(get_train_ds().repeat(),
steps_per_epoch = 100,
epochs = 5,
verbose = 1,
callbacks = [checkpoint])
model.save_weights("epoch-test.h5", overwrite=True)
I found the issue happened because I explicitly switched to graph mode by writing
from tensorflow.python.framework.ops import disable_eager_execution
disable_eager_execution()
Before
with tpu_strategy.scope():
model.fit(...)
Though I still don't understand the cause, remove disable_eager_execution solved the issue.
I have a regular model and I use tf.lite.TFLiteConverter.from_keras_model_file to convert it to .tflite model. Ane then I use interpreter to do the inference of images.
tf.logging.set_verbosity(tf.logging.DEBUG)
interpreter = tf.lite.Interpreter(model_path)
interpreter.allocate_tensors()
input_index = interpreter.get_input_details()[0]["index"]
output_index= interpreter.get_output_details()[0]["index"]
for loop:
(read image)
interpreter.set_tensor(input_index, image)
interpreter.invoke()
result = interpreter.get_tensor(output_index)
With the regular model, I use following to do the prediction.
model = keras.models.load_model({h5 model path}, custom_objects={'loss':loss})
for loop:
(read image)
result = model.predict(image)
However, the elapsed time on inference .tflite model is much longer than the regular. I also try the post-training quantization on the .tflite, but this model is the slowest one compared with the other two. Does it make sense? Why this happens? Is it any way to make the tensorflow lite model faster than its regular one? Thanks.
So let's say I have a trained Keras neural network called model. I want to save the model before further training so I can go back to that checkpoint as I train it on different datasets.
model.save('checkpoint_1.h5')
model.fit(data_1, labels_1)
model.save('checkpoint_2.h5')
Now here's where my question comes in. I want to free up the GPU memory so I can reload checkpoint_1 before further training of the model. What I'm currently doing is ending the current tensorflow session and starting a new one.
from keras import backend as K
#End the current and start a new tensorflow session to free up gpu memory
#to allow the next nn2 to be trained.
K.get_session().close()
sess = tf.Session()
K.set_session(sess)
I then load checkpoint_1 and continue training.
model = load_model('checkpoint_1.h5')
model.fit(data_2, labels_2)
Is there a better way of doing this? Stopping and starting the tensorflow sessions takes a lot of time.