Maximum no of objects in PDF - pdf

What are the maximum no of objects that a PDF file can have?

From the PDF specifications:
In general, PDF does not restrict the size or quantity of things described in the file
format, such as numbers, arrays, images, and so on.
....
PDF itself has one architectural limit. Because ten digits are allocated to byte
offsets, the size of a file is limited to 10
10
bytes (approximately 10GB)."

Related

TFRecord larger than the original data

Actually, I am dealing with many pictures which are from different videos, so I use tf.SequenceExample() to save them as different sequences and their labels attached into TFRcord.
But after running my code to generate TFRecord, it generates a TFRecord which is 29GB larger than my original pictures 3GB.
Is that normal to create TFRecord larger than the original data?
You may be storing the decoded images instead of the jpeg encoded ones. TFRecord has no concept of image formats so you can use any encoding you want. To keep the size the same, convert the original image file contents to a BytesList and store that without calling decode_image or using any image libraries or anything that understands image formats.
Another possibility is you might be storing the image as an Int64List full of bytes which would be 8x the size. Instead, store it as a BytesList containing a single Bytes.
Check the the type of data you load. I guess you load images as pixel-data. Every pixel is unit8 (8 bit) and likely to be converted to float (32 bit). Hence you have to expect that it gets 4 times the original size (3 GB -> 12 GB).
Also, the original format might have (better) compression than tfrecords. (I'm not sure if tfrecords can use compression)

Resizing multi-page mixed-format PDF with Ghostscript?

I have multi-page PDF-files with mixed formats A4 (portrait) - A0 (landscape).
Is Ghostscript capable of resizing the pages with size >A3 to A3 – but leaving the pages with smaller size (A4) not to be resized?
First, Ghostscript doesn't do manipulations of the input, you should read ghostpdl/doc/vectordevices.htm to see how Ghostscript and the pdfwrite device actually work.
Out of the box, no Ghostscript and the pdfwrite device won't allow you to produce output with differently sized media from the input, and different for each page (you can have it produce output sized to a single media size). It can, of course, be done, but will involve some programming, and in PostScript at that.
You would probably want to look at the pdf_PDF2PS_matrix routine in ghostpdl/Resource/Init/pdf_main.ps:
% Compute the matrix that transforms the PDF->PS "default" user space
/pdf_PDF2PS_matrix { % <pdfpagedict> -- matrix
...
Which calculates the scale factors required when resizing content to fit the media.
Also pdfshowpage_setup :
/pdfshowpage_setpage { % <pagedict> pdfshowpage_setpage <pagedict>
6 dict begin % for setpagedevice
% Stack: pdfpagedict
...
Which is where the selection of the media size takes place.
After spending long time looking for a solution, I found a great - and yet affordable - tool capable of doing the resizing and a lot more: PStill (http://www.pstill.com/)

Animated GIF larger than source images

I'm using imagemagick to create an animated GIF out of ~60 JPG 640x427px photos. The combined size of the JPGs is about 4MB.
However, the output GIF is ~12MB. Is there a reason why the GIF is considerably bigger? Can I conceivably achieve a GIF size of ~4MB?
The command I'm using is:
convert -channel RGB # no improvement in size
-delay 2x10 \
-size 640 \
-loop 0 \
-dispose Background # no improvement in size
-layers Optimize # about 2MB improvement
portrait/*.jpg portrait.gif
Using gifsicle didn't seem to improve either.
JPG is lossy compression.
GIF is lossless compression.
A better comparison would be to convert all the source images to GIF first, then combine them..
First google hit for GIF compression is http://ezgif.com/optimize which claims lossy GIF compresion, might work for you but I offer no warranty as I haven't tried it.
JPEG achieves it's compression through a (lossy) transform, where an 16x16 / 8x8 block of pixels is transformed to frequency representation and then quantized. Instead of selecting e.g. 256 levels (i.e. 8 bits) of red/green/blue per component, JPEG can ignore some frequency components, or use just 1 or 2 bits to represent them.
GIF on the other hand works by identifying repeated patterns from a paletted image (upto 256 entries), which occur exactly in the previously encoded/decoded stream. Both because of the JPEG compression, and the source of the images typically encoded by JPEG (natural full color), the probability of (long) exact matches is quite low.
60 RGB images with the size 640x427 is about 16 million pixels. To represent that much in 4 MB, requires a compression of 2 bits per pixel. To achieve this with GIF would require a very lossy algorithm, that would select (vector) quantization of true color pixels not to the closest pixel in the target GIF palette, but based also on the fact how good dictionary of code words this particular selection will make. The dictionary builds slowly and to achieve 2 bits/pixel, the average length of the decoded code word would have to map to 5.5 matching pixels in the close neighborhood.
By contrast, imagemagick has been able to compress the 16 million pixels (each selected from a palette of 256 elements) to 75% already!

how to calculate how much data can be embeded into an image

I want to know how much data can be embedded into an image of different sizes.
For example in 30kb image file how much data can be stored without distortion of the image.
it depends on the image type , algoridum , if i take a example as a 24bitmap image to store ASCII character
To store a one ASCII Character = Number of Pixels / 8 (one ASCII = 8bits )
It depends on two points:
How much bits per pixel in your image.
How much bits you will embed in one pixel .
O.K lets suppose that your color model is RGB and each pixel = 8*3 bits (one byte for each color), and you want embed 3 bits in one pixel.
data that can be embedded into an image = (number of pixels * 3) bits
If you would use the LSB to hide your information this would give 30000Bits of available space to use. 3750 bytes.
As the LSB represents 1 or 0 into a byte that gets values from 0-256 this gives you in the worst case scenario that you are going to modify all the LSBs distortion of 1/256 that equals 0,4%.
In the statistical average scenario you would get 0,2% distortion.
So depends on which bit of the byte you are going to change.

Compressing/Optimizing Vectors in PDF

I have a PDF of scanned book, the images are in JBIG2 format (B&W). I'd like to convert this to a vector PDF, which I can do easily by extracting the images and converting them to PDF vector graphics instructions with potrace.
The reason for this is that I want the PDF to display smoothly and quickly on an ebook reader device, such as a Kindle. With JBIG2 it is not doing this very well. Depending on the settings, the Kindle can't display the PDF, and even with that fixed it takes a long time to render each page. With a vector PDF the performance is much better, and the rendering very crisp.
The problem is that the resulting PDF is gigantic in filesize. Even with the streams gzcompressed to the max it is 300KB per page (original JBIG2 images were 30KB per page).
Is there any way I can optimize the vector graphics so that the filesize is much less?
Here is an segment of the vector drawing instructions:
0.100000 0.000000 0.000000 0.100000 0.000000 0.000000 cm
0 g
8277 29404 m
8263 29390 8270 29370 8289 29370 c
8335 29370 8340 29361 8340 29284 c
8340 29220 8338 29210 8323 29210 c
8194 29207 8141 29208 8132 29214 c
8125 29218 8120 29248 8120 29289 c
8120 29356 8121 29358 8150 29370 c
8201 29391 8184 29400 8095 29400 c
8004 29400 7986 29388 8033 29357 c
8056 29342 8057 29338 8057 29180 c
8058 29018 l
8029 29008 l
8012 29002 8001 28993 8003 28986 c
h
f
I would have thought that the numbers could be compressed down very easily, but apparently not. One page is 800KB uncompressed (as above) and 300KB gzcompressed. I would have thought that the compression ratio could be much better, considering how the instructions are all numbers in similar ranges.
I am afraid there's not much that can be done about this.
Of course, you might try to use LZW compression on PDF page streams (instead of Deflate) but it probably won't make much difference.
Another suggestions:
Smooth source image as much as possible / remove as many details as possible. This might render less curves (i.e. less data) during conversion.
Try to optimize values in PDF page stream. For example, you might try to use sophisticated combinations of scale / translate operators and changes to data. The goal here is to reduce length of operands.
For example, you might try to divide all operands (using integer, not floating-point division) by, say, 100 and add scaling before first operator. This approach most probably degrade the visual quality, though.
And of course, if you are going to do this to only a handful of files then I would say it's not worth the time.