Image decoded/encoded/decoded in Tensorflow is not the same as the original - tensorflow

I store images in tfrecord files for training an image classification model with Tensorflow 2.10. TFrecords are read in a dataset on which I apply the fit() function. After training I'm making an inference with
image from dataset
same image read from disk
I notice that the predictions are not the same because in the first case, image encompasses (in the process of writing then reading the tfrecord file to build the dataset) a decode/encode/decode transformation (TF functions tf.io.decode_jpeg and tf.io.encode_jpeg) that is not symmetrical: image after transformation is not the same as original image (even if I encode with quality=100).
It can make a difference: in the first case, the good class is not in the top-3. In the second case, yes.
Is there any way to avoid this asymmetrical behavior?

Related

How to train custom object detection with tfrecord file

here I want to train a object detection model, so I have annotated the data using roboflow and then exported it as tfrecords and also got the (.pbtxt file) and after that I don't have any clue on how to train a can model from scratch with just 2,3 number of hidden layers. am not getting on how to use that tfrecord to fit in my model which I have created. please help me out.
tfrecord files are usually used with Tensorflow Object Detection. It's pretty old and I haven't seen it used in practice recently, but there's a Tensorflow Object Detection tutorial here that uses these tfrecord files.
If there's not a particular reason you need to use TF Object Detection I'd recommend using a newer and more well-supported model like YOLOv5 or YOLOv7.

How to load a video as a 4 dimensional tensor in tensorflow to train a 3D-CNN?

I have a binary classification task at hand. I have created two folder of videos corresponding to the two classes. I have also created an annotation CSV which contains the path of the file and the class.
I am not able to find a way to create a generator which could load the MP4 video as a 4-D tensor using Tensorflow. I did it in PyTorch quite easily using torchvision.io.read_video function and extending the Dataset class. Is there any equivalent in Tensorflow?
I want to use it to train video classifier 3D-CNN and want to feed it a 5D tensor of (batch_size, frames, height, width, channel)
PS: I don't want to save individual frames on disk.

Faster RCNN + inception v2 input size

What is the input size of faster RCNN RPN?
I'm using an object detection API of Tensorflow which is using faster RCNN as region proposal network ( RPN ) and Inception as feature extractor ( according to the config file ). The API is using the online approach in prediction phase and detects every input image singly. however, I'm now trying to feed images to the network in the batch manner by use of Tensorflow dataset API.
as you know for make batch out of the data, firstly we need to resize all of the images to a same size. I think the best way of resizing the images is to resize them exactly to the input size of faster RCNN to avoid duplicate resizing. Now my question is what is the input size of the faster RCNN RPN?
thanks in advance
It depends on the input resolution which was specified in the pipeline config file, in image_resizer.
For example, for Faster R-CNN over InceptionV2 trained on COCO dataset, see this config file.
The specified resolution is 600x1024.
On a side note, fully convolutional architectures (such as RFCN, SSD, YOLO) don't restrict to a single resolution, i.e. you can apply them on different input resolution without modifying the architecture.
But this doesn't mean that the model will be robust to it if you're training on a single resolution.

what is the fastest way to feed video frames to tensorflow model?

there is a pre-trained Tensorflow model for image recognition. I want to use it as a feature extractor.
In order to increase performance Video Frames are read with cv2.read() and buffered in frameBuffer.
In the next step I choose a batch of images and feed them to graph after a set of pre-processes(in order to prepare images for feeding):
for x in range ( lowerIndex,upperIndex):
frame = frameBuffer[x]
img_data = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
encode_param = [int(cv2.IMWRITE_JPEG_QUALITY), 100]
result, img_str = cv2.imencode('.jpg', frame, encode_param)
img_str = img_str.tostring(order='C')
batchFrameBuffer.append(img_str)
pre-process step takes a significant time and during this time the utilization of GPU is 0.
Is there a way to keep GPU working all the time? (for example using different threads (one for read and preprocess and another for running session)
Tensorflow has the data.Dataset specifically for this. Check it out here.
You could create a dataset from frame_buffer, and then you can map a function to that dataset.
The function however needs to be tensorflow ops, but you need cv2 functions. For this, use tf.py_func as it allows you to wrap normal python code as tensorflow ops, read about it here.
The benefit of using dataset is that the multithreading is done in C++ by Tensorflow, rather than you having to manually do it in Python. You can even set the number of parallel threads as a parameter in dataset.map.
Once you have your dataset object, you can create an iterator from it, and build your graph starting from the iterator, so the over head of using feed_dict is avoided as well.
Here is a general Tensorflow performance guide for GPUs.
Hope this helps!
Squadrick's answer is probably the best idea but I also want to suggest preprocessing the video to extract frames as an independent step. If you have space for it, it'd be useful to have those frames independently. A lot of public video datasets (Amsterdam Ordinary Video Library, ImageNet VOD on Kaggle now) supply each video as a folder of jpegs, frame1.jpg, frame2.jpg etc). It saves decoding the video each time you want to train a new model. It’ll eat a lot of cpu and use a good amount of space, but it’ll speed up your model training. I’ve found mpeg frame extraction in opencv (C version) to be fairly slow, even just to display a frame.
Also, maybe FFMPEG could do your frame extraction to RGB as an independent step. Re-Encoding to jpeg only to decode back to RGB for model training is using CPU to save on disk space.

Tensorflow model file size varies significantly

I am using a tensorflow framework and I have noticed that there are major variances in the size of the tensorflow model files.
For example the framework provides 2 models:
one of pretrained model to be used with fine tuning for example
and one which contains an untrained version.
They both have a size of 172.539 kb
When I apply fine tuning in my model with some minor changes in the graph (there is a module in framework for that) and save my model the size remains essentially the same: 178.525 kb.
First, I am bit surprised that my fine-tuned model is somewhat bigger since I change just the last layer from 21 to 14 classes so I would expect a somewhat smaller model file size but since the difference is so little I didn't pay attention.
But when I trained the same model using the same model file (the pretrained one I mean) and saved the model in disk the file size is quite different: 340.097 kb. By the term train I mean I allow the network to modify all parameter not just the parameters of the last layer.
The model that is being implemented is a variation of resnet for semantic image segmentation (if can someone deduct the expected model file size from the model itself).
So, my questions are why I have such a variance in the model file sizes and how come my saved fine-tuned model is larger than the original model? Is there a way to include/exclude parameters in the model to be saved?
P.S.1 Some information that might be handy:
I am using tensorflow v2 model saving while I think the framework files use v1. I am not sure how to identify this besides the fact that the former produces 3 files.
The framework is called tensorflow-deeplab-resnet and can be found here and the models are here.
P.S.2
I am not sure stack overflow it 's the right place for this question either.
That is because, when training models and saving them, Tensorflow will also save the gradients of your ops.
So allowing training on the last layer will increase the size of your saved model a little. And allowing training on the whole model will essentially double the size of the save file because each op will have its gradients saved.