How YUV reconstructed file is produced in open source x.265 encoder? - hevc

As x.265 encoder produces reconstructed yuv video file after decoding and
I know, in open source x.265 encoder you input raw yuv video and it generates HEVC file for you along with reconstructed YUV file.
My question is, can I input HEVC file directly to produce reconstructed yuv file?
If yes, how?

No. You can not input HEVC file directly to HEVC encoder as encoders does not have any entropy decoding modules(It includes other decoding modules such as motion compensation, IDCT, Inverse Qunat etc). But entire point of the encoder is to encode raw video to HEVC bit stream. Instead you can provide HEVC stream to a HEVC decoder and get the exact same reconstructed YUV. Bit-matching encoder reconstructed YUV with the decoded YUV from the standard decoder ensures proper working of the encoder

X.265 is the encoder of HEVC, not a decoder. you can use ffmpeg or openHEVC to decoder HEVC stream.

x.265 encoder produces reconstructed yuv video file after decoding
[..]
it generates HEVC file for you along with reconstructed YUV file
Let me clear this up a bit. Yes, x265 (and other encoders) can create a YUV file during encoding. However, it's not for decoding purposes. It's for debugging purposes. The purpose of an encoder is to create a video stream that can be decoded by a decoder. In order to do that, the encoder and decoder need to agree on an intermediary exchange format, i.e. the standardized bitstream format.
This might seem obvious - in this case, I mean HEVC - but you have to understand that encoders (and decoders) can have bugs. How do you find these bugs? You test! How do you test? You look at the bitstream generated by the encoder, and the YUV representation that the encoder believes the decoder would have created while decoding. And then you decode the file using an independent decoder, and check that the 2 YUV files are identical.
The important thing here is that the encoder didn't actually decode it. Rather, the YUV is the actual internal bitmap representation that the encoder believes should be reconstructed by a decoder given the block/mode choices specified in the bitstream. The encoder never decoded. It just reconstructed the bitmap as it believes the decoder would have done, and then encoded the reconstruction information in the bitstream. (And for reference frames, the bitmap is then used as a target reference for subsequently encoded frames.)

Related

Image decoded/encoded/decoded in Tensorflow is not the same as the original

I store images in tfrecord files for training an image classification model with Tensorflow 2.10. TFrecords are read in a dataset on which I apply the fit() function. After training I'm making an inference with
image from dataset
same image read from disk
I notice that the predictions are not the same because in the first case, image encompasses (in the process of writing then reading the tfrecord file to build the dataset) a decode/encode/decode transformation (TF functions tf.io.decode_jpeg and tf.io.encode_jpeg) that is not symmetrical: image after transformation is not the same as original image (even if I encode with quality=100).
It can make a difference: in the first case, the good class is not in the top-3. In the second case, yes.
Is there any way to avoid this asymmetrical behavior?

Use of VideoFileWriter with h264_amf encoder

I'm using AMD Gpu, and the VideoFileWriter doing exactly what I need,
the only thing that I didn't find a way to use is the h264_amf encoder instead of the usual h264 encoder.

How to understand the output key of tflite file

I am learning and running the tensorflow Pi Camera example, but don't know how to understand the output key of tflite file? such as the key "quantization" of https://github.com/tensorflow/examples/blob/master/lite/examples/image_classification/raspberry_pi/classify_picamera.py#L52 , any document ?
You can look this article about quantization
The fundamental idea behind quantization is that if we convert the weights and inputs into integer types, we consume less memory and on certain hardware, the calculations are faster.
And I'm not sure what you are asking about the output file. But you can process your tflite file inside this git repo to understand what input your model needs and what will be the output
You could refer to the description for post-training quantization https://www.tensorflow.org/lite/performance/post_training_quantization#representation_for_quantized_tensors and quantization specification https://www.tensorflow.org/lite/performance/quantization_spec.

Tensorflow: partially decode binary data

I am wondering if there is a native tensorflow function that allows to decode a binary file (for example a tfrecord) starting from a given byte (offset) and reading the following N bytes, without decoding the entire file.
This has been implemented for jpeg images: tf.image.decode_and_crop_jpeg
but I cannot find a way to do the same thing with any binary file.
This would be very useful when the cropping window is much smaller than the whole data.
Currently, I am using a custom tf.py_func as mapping function of a Dataset object. It works, but with all the limitation of a custom py_func.
Is there a native tensorflow way to do the same thing?

RGB or BGR for Tensorflow-slim-ResNet V2 pre-trained model?

For CNN training, the exact order of input image channels can be different from library to library, even model to model. For Caffe, the input image is usually expected to be in BGR order, while in Tensorflow, the order can be arbitrary.
So does anyone know for sure in what order (BGR or RGB) is the ResNet_V2 pre-trained model of Tensorflow slim library trained? It reads in the document that:
And I checked the script in this link: https://github.com/tensorflow/models/blob/master/research/slim/datasets/build_imagenet_data.py, it says the image is encoded in RGB. But I'm still not sure in which order is ResNet_V2 trained?
Does anyone have similar confusion about this issue? Thanks for any feedback!
It is RGB. The colorspace depends on how the image was read into memory during the data preparation. Caffe uses OpenCV for many image operations, and OpenCV defaults to reading images into BGR, while in TensorFlow universe it is more often to rely upon PIL library.
The colorspace stated in the script is RGB, see line 206.