How can I add bleed to a PDF book cover using Photoshop? - pdf

I have a book cover sent as a PDF which is according 5x8 dimensions, only that is does not have bleed and I need to add it. I am using Adobe InDesign CC 2015 and Adobe Photoshop CC 2015. How can I make it happen?

Create a document in indesign and place the pdf into the document, centered. The document must have bigger dimensions for the bleed. Then put whatever in the bleed that's necessary.
A better way would be to load the pdf file into Illustrator and add the bleed there. Color matching will be a lot more accurate that way. Especially if the pdf contains any vector artwork like outlined fonts. Make sure you've got the correct fonts installed.
If the pdf only contains bitmap images, you could import it into photoshop. Make sure you've created a new document with the needed dimensions and resolution. 300 to 600 dpi would be a good starting point. Usually type will not look clean and sharp when using photoshop since its working on a picture (bitmap). Illustrator or Indesign would make type much cleaner.
If it's possible to get the source files instead, that would be the preferred method. Even if you'd have to redo the artwork would be better to have the original source files/pictures/fonts etc.

Related

How to transfer OCR text from one PDF to another PDF?

I have two versions of one same scanned PDF. One of them has an OCR layer. How can I transfer the layer to the other one? I already install Ghostscript, but I don't know what to do next.
How to Use Ghostscript
There's no such thing as an 'OCR layer' in PDF.
Most likely what you have is a PDF file which has a scanned image and the text extracted from that image using OCR which has been drawn as 'invisible' text (text rendering mode 3).
In general you can't copy and paste text between PDF files, so it's very hard to do what you are asking. I don't know of any tools which will help you here, I can say for certain that Ghostscript absolutely will not help you at all.
Most likely you will also need to copy the Font (or CIDFont) from the PDF file as well, and if it has a ToUnicode CMap you'll definitely also want that or search won't work (and there's little point in this sort of OCR otherwise).
Since you have a PDF file which includes the OCR'ed text, why not simply use that PDF ? I can't see any reason why you would want to 'transfer' it to another PDF file.

Ghostscript Pdf Transparant Objects Removal

I have several .pdf containing reoccuring transparent objects (text).
(It has non transparant objects (text and vectors) as well as images)
Not watermarks made by Acrobat or others. It is in the background as styling.
To removing them manualy these is impossible, since the content on the pdf pages is mixed with the text on the page (grouping).
Is there a way to alter the opacity of translucent objects to 0. Or even removing them completely from the pdf, with ghostscript?
Using adobe acrobat pre-flight, and moving them as images, removes all of the images in the pdf, instead of only the transparent objects.
How can this be achieved with the help of Ghostscript and the
appropriate PostScript code?
These Awnsers
https://stackoverflow.com/a/29657475/9921462
https://stackoverflow.com/a/37858893/9921462
Where helpful, to know how to get objects and images, but not filtering specifically for transparent objects.
Any ideas are appreciated as well.
I suspect that the reason you can't remove the objects using Acrobat is not because they are transparent. Its much more likely that they are described by a Form XObject. Those can't normally be edited by Acrobat.
You can use -dNOTRANSPARENCY and the pdfwrite device to produce a new PDF file with no transparency, but that will eliminate all the transparency in the output file.
Fundamentally there's no real way to do what you want, other than by manually editing the PDF file in an editor. You should go back to the original document and do the work there. PDF is not intended as an editable format.

How is hidden text stored in OCR-enhanced PDF files

// EDIT 26.03.2018 - Who wants to continue my work can have a look on my source-files https://github.com/n0l0cale/ocr-sampledata
I'm actually looking for some details about PDF Files. It's most important for me that the files will be usable for a very long time and if possible the OCR should be automatically applied for new files (which seems to be not really possible with Adobe Acrobat...).
For that I've been looking for different solutions how to OCR my PDF Files. I found three candidates which seems to be doing what they should do... (more or less). But all three variants have their pro&cons... But there seem to be different approaches how to store data in PDF Files.... for all three Variants... Let me explain:
a File OCRed with Adobe Acrobat:
https://github.com/n0l0cale/ocr-sampledata/blob/master/A4%20sample_ACROBAT.pdf
results in a file that Acrobat is able to open in one step (no preloading of any background layer) and after a preflight-script I'm able to see the text which is stored hidden:
a File OCRed with Abby Finereader:
https://github.com/n0l0cale/ocr-sampledata/blob/master/A4%20sample_ABBY.pdf
does not seem suitable for the default adobe preflight-script as it does not display any additional layers:
But far as I was able to reproduce these Files seems to have a Background-Text-Layer, which contains the OCRed Text, which is the underlying layer for the Image that is shown to the user at the end. Unfortunately this seems to be loaded separately and this is confusing while opening the file with Adobe Acrobat...
a File OCRed with Tesseract 4 (Alpha):
https://github.com/n0l0cale/ocr-sampledata/blob/master/A4%20sample_TESSERACT_oem2.pdf
is also doing some weird magic with the hidden text part:
But in all three cases I'm able to search for words in the files and see the text using "Remove hidden information" and selecting "hidden text":
I'm seriously confused.... Does anyone know how these programs are storing their hidden text information really?
S.
P.S.: For those wondering what this ominous preflight script is: https://theblog.adobe.com/hidden-gems-in-acrobat-dc-how-to-optimize-hidden-ocr-text/
Does anyone know how these programs are storing their hidden text information really?
You correctly have found out that the approach of Abby Finereader is different from that of Adobe Acrobat and of Tesseract:
Abby creates a page content stream in which first the text is drawn normally on the page and eventually covered by the scanned image.
Acrobat and Tesseract create content streams in which first the image is drawn and then the text is drawn invisibly (using text rendering mode 3 which draws nothing).
The difference between the latter two results is the choice of font used:
Acrobat uses regular standard 14 fonts for which a PDF viewer has a font program to render them as normal glyphs.
Tesseract uses a font GlyphLessFont it embeds a font program for into the result file. When rendered the glyphs in this font do not show as our normal Latin glyphs but merely as empty space.
Considering the visual effect you observed for the Abby result, the approach used by Acrobat or Tesseract might be preferable.
Whether one prefers fonts with visually recognizable glyphs (as used by Acrobat) or without (as used by Tesseract), is mostly a mere matter of taste. They are used only in the invisible rendering mode anyways.

PDF Creator - Difference in the PDF Quality vs Adobe Acrobat. How can I change it?

When I am using PDF Creator to create PDF documents the quality of the fonts is not exactly the same as when I am using Adobe Acrobat to create the same PDF. The fonts when creating with pdf creator are a bit more fussy (not as crispy as with Adobe).
Does anyone know if/how I can resolve this?
Here are 2 example documents that demonstrate what I mean:
Example of PDF created with PDF Creator
Example of PDF created with Adobe Acrobat
I don't have a solution for you unfortunately but I can tell you that what you are seeing is anti-aliasing. If anti-aliasing is enabled, fonts at lower resolutions will get that "fuzziness" that some people believe helps with reading. It might not look as pretty but it improves word recognition (so the theory goes). But that's beside the point. What you need to do is look for a setting to disable anti-aliasing. If you can't find it then you might have to look into setting actual Ghostscript settings, possibly dTextAlphaBits but I'm not a Ghostscript expert.
You can tell its anti-aliasing because the "fuzziness" only appears when the fonts are small. Once you zoom in it all goes away.
Image zoomed out:
Image zoomed in

PDF Colo(u)r Analysis (without Acrobat itself ?)

Is there a library/tool which would list all colours used in a PDF document ?
I'm sure Acrobat itself would do this but I would like an alternative (ideally something that could be scripted).
So the idea is if you have a very simple PDF document with four colours in it the output might say :
RGB(100,0,0)
RGB(105,0,0)
CMYK(0,0,0,1)
CMYK(1,1,1,1)
You could explore the insides with pdfbox, but you would have to write some code to find and catalog all those colors.
Most PDF tools have access to this information but no api to access it. You could take any tool and add it in
Apago PDFspy generates an XML file containing all kinds of metadata extracted from PDF files. It reports color usage including spot colors.
We recently added a function called GetPageColorSpaces(0) to the Quick PDF Library - www.quickpdflibrary.com to retrieve much of the ColorSpace info used in the document.
Here is some sample output.
Resource,\"QuickPDFCS2eb0f578\",Separation,\"HKS 52 E\",DeviceCMYK,0.95,0,0.55,0
Resource,\"QuickPDFCSb7b05308\",Separation,\"Black\",DeviceCMYK,0,0,0,1
Resource,\"QuickPDFCSd9f10810\",Separation,\"Pantone 117 C\",DeviceCMYK,0,0.18,1,0.15
Resource,\"QuickPDFCS9314518c\",Separation,\"All\",DeviceCMYK,0,1,0,0.5
Resource,\"QuickPDFCS333d463d\",Separation,\"noplate\",DeviceCMYK,1,0,0,0
Resource,\"QuickPDFCSb41cafc4\",Separation,\"noprint\",DeviceCMYK,0,1,0,0
Resource,\"Cs10\",DeviceN,Black,Colorant,-1,-1,-1,-1
Resource,\"Cs10\",DeviceN,P1495,Colorant,-1,-1,-1,-1
Resource,\"Cs10\",DeviceN,CalRGB,Colorant,-1,-1,-1,-1
Resource,\"Cs10\",Separation,\"P1495\",DeviceCMYK,0,0.31,0.69,0
XObject,\"R29\",Image,,DeviceRGB,-1,-1,-1,-1
Disclaimer: I work at Atalasoft.
Our product, DotImage with the PDF Reader add-on, can do this. The easiest way is to rasterize the page and then just use any of our image analysis tools to get the colors.
This example shows how to do it if you want to group similar colors -- the deployed example will only work for PNG and JPEG, but if you download the code, it's trivial to include the add-on and get PDF as well (let me know if you need help)
Source here:
http://www.atalasoft.com/cs/blogs/31appsin31days/archive/2008/05/30/color-scheme-generator.aspx
Run it here:
http://www.atalasoft.com/31apps/ColorSchemeGenerator
If you are working with specific and simple PDF documents from a constrained source then you may be able to find the colors by reading through the content stream. However this cannot be a generic solution.
For example PDF documents can contain gradients or transparency. If your document contains this type of construct then you are likely to end up with a wide range of colors rather than a specific set.
Similarly many PDF documents contain bitmapped images. Given that these will need to be interpolated to be displayed at different resolutions, the set of colors in a displayed PDF may be bigger or different to (though obviously broadly similar to) the embedded bitmap.
Similarly many PDF documents contain constructs in multiple color spaces that are rendered into different color spaces. For example a PDF might contain a DeviceRGB bitmap, a line in an ICC based CMYK color and a Lab based rectangle. The displayed version might be in sRGB for display or CMYK for print. Each of these will influence the precise set of colors.
So the only 100% valid answer is going to be related to a particular render of a PDF at a particular resolution to a particular color space. From the resultant bitmap you can determine the colors that have been used.
There are a variety of PDF libraries that will do this type of render including DotImage (referenced in another answer) and ABCpdf .NET (on which I work).