Is it allowed that a stream that is encoded using FlateDecode with a PNG predictor has a last predictor row that doesn't have the same column width as all the other rows? I.e. it misses some data?
Imagine, for example, a stream that has already been decoded using the Flate algorithm, resulting in 105 bytes. And a predictor with the parameters <</Predictor 15 /Columns 10>>.
Since the stream has 105 bytes, the predictor can decode 10 full rows containing 10 columns each, and one row with only 5 columns, i.e. data for 5 columns is missing. Should the last row be decoded as a row with only 5 columns, or should the last 5 bytes be discarded, or is the stream as a whole just invalid?
I didn't find anything in the PDF specification but I came across two PDF files in the wild that have such streams.
It is up to you to decide how to deal with invalid streams, PDF specification does not handle invalid data.
For example we take all the data that can be decoded and the rest is padded with 0.
Related
I always thought that being a binary format, TFRecord will consume less space then a human-readable csv. But when I tried to compare them, I saw that it is not the case.
For example here I create a num_rows X 10 matrix with num_rows labels and save it as a csv. I do the same by saving it to TFRecors:
import pandas as pd
import tensorflow as tf
from random import randint
num_rows = 1000000
df = pd.DataFrame([[randint(0,300) for r in xrange(10)] + [randint(0, 1)] for i in xrange(num_rows)])
df.to_csv("data/test.csv", index=False, header=False)
writer = tf.python_io.TFRecordWriter('data/test.bin')
for _, row in df.iterrows():
arr = list(row)
features, label = arr[:-1], arr[-1]
example = tf.train.Example(features=tf.train.Features(feature={
'features' : tf.train.Feature(int64_list=tf.train.Int64List(value=features)),
'label': tf.train.Feature(int64_list=tf.train.Int64List(value=[label])),
}))
writer.write(example.SerializeToString())
writer.close()
Not only it takes way more time to create a binary file than a csv (2 sec VS 1 min 50 sec), but it also uses almost 2 times more space (38Mb VS 67.7Mb).
Do I do it correctly? How can I make the output file smaller (saw TFRecordCompressionType), but is there anything else I can do? And what is the reason for a much bigger size?
Vijay's comment regarding int64 makes sense but still does not answer everything. Int64 consumes 8 bytes, because I am storing data in csv, the string representation of the integer should be of length 8. So if I do this df = pd.DataFrame([[randint(1000000,99999999) for r in xrange(10)] for i in xrange(num_rows)]) I still get a slightly bigger size. Now it is 90.9Mb VS 89.1Mb. In additional to this csv stores 1 byte for each comma between each integers.
The fact that your file is bigger is due to the overhead that TFRecords has for each row, in particular the fact that the label names are stored every time.
In your example, if you increase the number of features (from 10 to say 1000) you will observe that your tfrecord file is actually about half the size of the csv.
Also that the fact that integers are stored on 64 bits is eventually irrelevant, because the serialization uses a "varint" encoding that depends on the value of the integer, not on its initial encoding. Take your example above, and instead of a random value between 0 and 300, use a constant value of 300: you will see that your file size increases.
Note that the number of bytes used for the encoding is not exactly that of the integer itself. So a value of 255 will still need two bytes, but a value of 127 will take one byte. Interesting to know, negative values come with a huge penalty: 10 bytes for storage no matter what.
The correspondance between values and storage requirements is found in protobuf's function _SignedVarintSize.
This may because of your generated numbers are in range 0~300, so they just need 3 bytes most to store a number, but when they are stored in tfrecords as int64, it need at least 8 bytes(not very sure) to store a number. If your generated numbers are in range 0~2^64-1, I think tfrecords file will much smaller than csv file.
Actually, I am dealing with many pictures which are from different videos, so I use tf.SequenceExample() to save them as different sequences and their labels attached into TFRcord.
But after running my code to generate TFRecord, it generates a TFRecord which is 29GB larger than my original pictures 3GB.
Is that normal to create TFRecord larger than the original data?
You may be storing the decoded images instead of the jpeg encoded ones. TFRecord has no concept of image formats so you can use any encoding you want. To keep the size the same, convert the original image file contents to a BytesList and store that without calling decode_image or using any image libraries or anything that understands image formats.
Another possibility is you might be storing the image as an Int64List full of bytes which would be 8x the size. Instead, store it as a BytesList containing a single Bytes.
Check the the type of data you load. I guess you load images as pixel-data. Every pixel is unit8 (8 bit) and likely to be converted to float (32 bit). Hence you have to expect that it gets 4 times the original size (3 GB -> 12 GB).
Also, the original format might have (better) compression than tfrecords. (I'm not sure if tfrecords can use compression)
I am working on rate control algorithms in HEVC. I have a problem. Can any one help me with how can I find each frame of an input video in HEVC? (Because I need to work on each frame as a distinct image but I don't know in what variable in HEVC each input frame is stored an processed.)
To find new frame
Parse the input data as per HEVC draft (annex B) to find start of frame (start_code_prefix_one_3bytes ). This will give you new NAL units.
If you find VCL-NAL parse slice header and find value of first_slice_segment_in_pic_flag equal to 1 specifies that the slice segment is the first slice segment of the picture in decoding order.
Do same process until you will get next vcl nal containing first_slice_segment_in_pic_flag equal to 1. So this will indicate you this is the start of new or next frame.
i'm using Omnivision ov5620
http://electronics123.net/amazon/datasheet/OV5620_CLCC_DS%20(1.3).pdf
this is datasheet.
than, you can see the Output Format 10-bit digital RGB Raw data.
first, i know RGB raw data is bayer array.
so, 10-bit RGB mean each channel of 1024 scale? range is 0~1023?
or 8-bit RGB each channel and four-LSB[2:0] is new fifth pixel data?
please refer the image
which is correct?
They pack every four adjacent 10-bit pixels (0..1023) of the line into 5 sequential bytes, where each of the first 4 bytes contains the MSB part of the pixel, and the 5th byte contains LSBs of all four pixels packed together into one byte.
This is convenient format because if you want to convert it to RGB8 you just ignore that fifth byte.
Also each displayed line begins with the packer header (PH) byte and terminates with the packer footer (PF) byte. And the whole frame begins with the frame start (FS) byte and terminates with the frame end (FE) byte.
What are the maximum no of objects that a PDF file can have?
From the PDF specifications:
In general, PDF does not restrict the size or quantity of things described in the file
format, such as numbers, arrays, images, and so on.
....
PDF itself has one architectural limit. Because ten digits are allocated to byte
offsets, the size of a file is limited to 10
10
bytes (approximately 10GB)."