How do you create a PDF for Kindle Direct Publishing with wkhtmltopdf - pdf

Kindle Direct Publishing AKA Create Space, wants a PDF/X-1a in 6x9 format with 0.25" outside margins and 0.375" inside/gutter margins, which I need help with, since Qt PDF generator does not do inside and out, so I have to set them both for the largest, and I need to know if my css effects this, if so how, but setting --margin-left .375in --margin-right .375in gives me this error: Currently all margin units must be the same, not sure what that means, why have a left and right if they must both be the same, and does this really have to apply to top and bottom, what is the thinking, so I added it to top and bottom just to make the file, but it is not what I wanted for margins, I wonder if gs can fix this?
If so how.
I know that wkhtmltopdf currently only creates PDF version 1.4, and Kindle does not seem to mind that much on upload, I do not have a published upload yet, so I hope someone has and knows this from experience, because I do not know if they will accept that yet, so I also use Ghost Script to convert that to PDF version 1.7, this is what I have currently:
PDF_Combine is a bash array of files:
PDF_Combine=("file1.html" "file2.html");
Update: Now KDP wants .875in margins, on both sides my content is real small, how dose CSS effect Margins in a PDF, can I set the Margins to 0 in wkhtmltopdf and adjust them in my CSS, if so how, in the body?
wkhtmltopdf --margin-left .375in --margin-right .375in --margin-bottom .375in --margin-top .375in --page-width 6in --page-height 9in --load-error-handling ignore --javascript-delay 3333 --enable-forms --footer-center "[page]/[topage]" "${PDF_Combine[#]}" "/MyPath/MyFileName.pdf"
The Ghost Script:
gs -dPDFA -dBATCH -dNOPAUSE -sProcessColorModel=DeviceCMYK -sDEVICE=pdfwrite -sPDFACompatibilityPolicy=1 -sOutputFile="/MyPath/MyOutPutFileName.pdf" "/MyPath/MyInPutFileName.pdf"

PDF/X-1a is a highly specific restricted set of PDF features; for instance you cannot use the RGB colour model when producing a PDF/X-1a
You also need specific keys to be present in the Info dictionary of the PDF file.
Ghostscript does not create PDF/X-1a files. It can create PDF/X-3 files, but that's no good to you.
You can process a PDF file using Ghostscript to add white space around the page, what you need to do is specify a larger media size, and tell Ghostscript its a fixed size, so the PDF file can't change it. Then you need to offset the content up and right on the new media (because otherwise the content will be rendered in the bottom left corner). Note; its not clear to me if you want to increase the media size, or reduce the content so that it fits into the required media, but has margins.
See the answer and comments on the question here.

Related

Pdf Editing: change page size WITHOUT resizing content or rotate page WITHOUT rotating content

I receive postage labels from a supplier as single page pdf documents. The labels would fit on an A5 sheet but they are presented as a portrait within an A4 page, also in portrait orientation. I would like to be able to print two of these labels per A4 page to cut down on waste.
This can be achieved by rotating the page content without rotating the page itself. Or by resizing the page by swapping the height and width about the content. I am aware that both of these things can result in content being lost, which isn't a problem for my use case. Ideally I'd like a command line application that works on both Linux or Windows machines. Unfortunately, web searches for "rotate" or "resize" pdf will point to the many applications that just rotate or resize pdf pages along with the content which isn't what I want.
Similar questions:
With PdfBox: identical use case, see my comments on PdfBox below.
With iText: almost identical use case, I explicitly don't want any resizing of the content. See my comments on iText below as well.
Things I have investigated tried:
pdftk - too basic
ImageMagick - the original image contains transparency and the extent argument results in a visible loss of quality
pdfjam - also requires install of Latex and PdfPages. Ideally I'd like something that works on both Windows and Linux.
iText7 - the documentation isn't great. Looks like it was completely re-written in the last few years and the Nuget feed makes it clear that previous version, iTextSharp, is EOL. Consequently most of the examples one finds online (including on this site) are out of date. iText7 doesn't let you resize a page. I got as far as saving a document with a new page that was the right size but struggling to copy the content over. I think I could get what I wanted from this but it would take a long time and I'm trying to do something simple.
PdfBox - I've already tried one .NET library without success. Looking at the comments to the question I've linked above, this one seems to also have a version issue. I'm trying to do something really simple here, I will try this one if I exhaust all other avenues
Gimp - does what I want but I have to fire up the application, point and click quite a few times to rescale the image canvas, set the background and export
Screenshot the label from a pdf reader at 100% size and paste into a Word/LibreOffice doc. Sadly this is the most reliable method I have at the moment
I have example labels but they contain the name and address of people I've sent things to, I'd rather not upload them.
Try the command line tool cpdf from here: https://community.coherentpdf.com
cpdf -rotate-contents <angle> in.pdf -o out.pdf
to rotate contents without rotating the page. or...
cpdf -mediabox "100 100 600 500" in.pdf -o out.pdf
(and -cropbox and so on) to change page dimensions without altering content. Chapter 3 of the manual is of relevance.
You can also prepare the file by removing any page rotation whilst counter-rotating the content to leave the visual appearance unchanged:
cpdf -upright in.pdf -o out.pdf

How can I renderize a PDF into BMP fitting content to PDF page boundaries?

I am getting a BMP from a PDF with GhostScript, but its content is not fitted into page boundaries. Even I try any option, I am not able to get the content fitted.
I've tried to generate the BMP with different GhostScript options, but noone seems to fit 100% ok the content.
This is the last command I tried. Please, don't expect it to have what I need, just copied & paste from tty.
gs -dBATCH -dNOPAUSE -sPAPERSIZE=a4 -dFIXEDMEDIA -dPSFitPage -sDEVICE=bmpmono -sOutputFile=Betlem.bmp -g1184x968 -c "<</PageSize [900 500]>> setpagedevice 0 0 translate" -c "<</PageOffset [-23 -100]>> setpagedevice" -f Betlem.pdf
I am expecting to get the content fitted into the BMP image borders, without exception of a pixel. I am using an OpenCV & Python function to extract content and fit in new image and this is the debug:
initial BMP image resolution = (872, 900)
BMP image resolution after fit content into new page = (541, 870)
Have a look to the following thread for the fitting funtion in Python:
I can't find a way to fit contour on new image zero point
You are using PSFitPage for a PDF file, you should be using PDFFitPage or just FitPage.
Note that the 'fitting' in this case is fitting the PDF media size to the existing media. If the PDF content leaves white space around the edge of the media, then the resulting scaling will include that.
In addition you are using PostScript to offset the page origin, which will introduce white space, and you are trying to change the media size, which won't work because you've set -dFIXEDMEDIA. Using these in combination with any of the FitPage switches is not likely to work well.
Randomly stabbing at controls and copying bits of code intended to solve different problems isn't likely to help you I'm afraid.
Without seeing an example file I can't, of course, tell you how to solve your problem, and I'm not really sure exactly what you are trying to achieve. A bitmap with no white space ? A bitmap of a given size with no white space ? Something else ?
[Edit]
OK so looking at the PDF file, the media box is 11.69x8.27 inches, there is white space at the top, bottom, left and right between the marks on the page and the edge of the media.
Running this through Ghostscript, to TIFF at 72 dpi results in a file which Adobe Photoshop says is 11.694x8.264 inches and has white space at top bottom left and right, just like the PDF file.
By default Ghostscript uses the Media size from the PDF to render to, however you can change this. If you were to change the media size to (say) 5.8x4.14 inches, set -dFIXEDMEDIA and then rendered the PDF file what would happen is that the top and right hand side of the PDF file would be 'off the page' so you would only get the left hand portion rendered. Try this:
gs -DEVICEWIDTHPOINTS=421 -dDEVICEHEIGHTPOINTS=298 -dFIXEDMEDIA "A betlem m en vull anar(1).pdf"
You will see the white space is still present at bottom and left, and the top and right have fallen off the page.
Now, if you add FitPage that will scale the original media down until it fits the new media size (and all the content too, of course). If you try:
gs -DEVICEWIDTHPOINTS=421 -dDEVICEHEIGHTPOINTS=298 -dFIXEDMEDIA -dFitPage "A betlem m en vull anar(1).pdf"
You'll see that the output is the same physical dfimensions as the previous command, but now the whole of the PDF content can be seen because its been scaled down. You should also see that the distribution of white space has changed, because I didn't strictly divide by 2 in each direction. The FitPage switch scaled the content in both directions by the same amount, and distributed the extra space in the x direction evenly to each side, as new white space.
Now I've no clue what you mean by 'simmetric'. You can undoubtedly do what you want using Ghostscript and the PostScript language, but I don't know what it is you want. Pointing me at Python code isn't going to help I'm afraid, I don't speak Python.
I can say that Ghostscript does not add extra white space that isn't present in the original unless you mess with the rendering by addding parameters like FitPage and FIXEDMEDIA.
If you can explain what you are trying to achieve I can probably tell you what to do.

Use Ghostscript / PostScript to convert all text colours to black within a PDF

I want to convert the white text in this PDF into black text and generate a new PDF with the changed text.
I have found this
http://www.artifex.com/files/Ghostscript_Color_Architecture.pdf
which mentions settings like -sTextICCProfile but using black_output.icc from
http://www(dot)ghostscript.com/doc/toolbin/color/icc_creator/effects/
like so:
gs -o test.pdf -sTextICCProfile=black_output.icc out.pdf
does not change the text colour to black.
Is the usage of the .icc profile incorrect? Is it even the right approach?
Is there a way to achieve this with postscript?
Example PDF
The usage of the ICCProfile is correct...
However, that usage is for rendering, it has no effect on the pdfwrite device at all (because it doesn't render the input, it turns it into a PDF file). So no, this is not the correct approach.
There is no real means to do what you want with Ghostscript. Technically its probably possible, but it wouldn't be easy. You also haven't apparently posted an example of the PDF file. Its entirely possible that the 'text' is not actually text. It may be an image, or vectors, which look like text.
There may also be transparency ivolved which would complicate the matter still further.

Ghostscript loses emdash characters and replaces with hyphens

When I run a PDF which was originally created with LibreOffice on Linux, through ghostscript 9.19 on OSX, to produce another (flattened) PDF, the output is perfect except for one problem. All emdashes in the entire document have been replaced with a standard hyphen (awkwardly followed by half of a space.) Oddly enough, if I highlight the resulting "hyphen+space", my context menu shows that I've selected an emdash, so the underlying text is still an emdash, it is just rendering the wrong glyph.
I can reproduce this on multiple documents from the same source, and I'm assuming there's a setting or switch somewhere that can help resolve this.
I don't know whether the font used makes a difference, but for the sake of reference, the body text of my document is set in Arno Pro. When I use a modern version of LibreOffice on OS X to make a sample document also containing an emdash in Arno Pro, the same problem is not exhibited, so it seems to be specific to the software which originally made these PDF files.
These PDFs are of legacy projects that I am not set-up to re-produce at this time, so I need to prepare them for reprinting using the existing files.
How do I retain emdash glyphs when running a command such as the following?
gs -dSAFER -dBATCH -dNOPAUSE -dNOCACHE -sDEVICE=pdfwrite \
-sColorConversionStrategy=/LeaveColorUnchanged \
-dAutoFilterColorImages=true -dAutoFilterGrayImages=true \
-sOutputFile=output.pdf input.pdf
I can add an example of the input PDF to this question if needed.
Without seeing the PDF file it isn't possible to give you an answer. Most likely the font isn't embedded, or if it is embedded doesn't have an emdash glyph.
Copy and paste uses the ToUnicode CMap, so it isn't dependent on the font. Its simply a list of character codes and the Unicode code point associated with each, when using a given font.
Note that this doesn't mean 'the underlying text is still an emdash'. The ToUnicode information is utterly separate from the font end of things, it is effectively metadata and bears no real relation to the font or rendering.
Put the file on DropBox and post the URL and someone can look into it. I'll be on vacation for the next few days though, but maybe someone else will look.
Note that in PDF you don't necessarily specify characters and positions as a list of consecutive characters; you can specify the position of each individually, or you can specify widths which override the width in the font, etc. So there almost certainly is only one glyph, the 'white space' you refer to is probably just that, white space, its not another glyph.
I should also point out (I do this a lot) that Ghostscript never 'flattens', concatenates, merges, or anything similar operation on PDF files. WHen using Ghostscript and the pdfwrite device the original input (in whatever format) is fully interpreted into graphics marking operations, and sent tot eh device. The device executes the marking operations; in the case of a rendering device, it scan-converts and writes to a bitmap. In the case of pdfwrite, it creates PDF operators.
The result of this is that the output PDF file bears no relation to the input PDF, other than its visual appearance.
You also don't say which version of Ghostscript you are using....

Obey the MediaBox/CropBox in PDF when using Ghostscript to render a PDF to a PNG

I've been using Ghostscript to convert my single figure plots rendered in PDF to PNG:
gswin32c -sDEVICE=png16m -r300x300 -sOutputFile=junk.png ^
-dBATCH -dNOPAUSE Figure_001-a.pdf
This works in the sense I get a PNG out and it contains the plot.
But it contains a huge amount of white space as well (an example source image: http://cdsweb.cern.ch/record/1258681/files/Figure_001-a.pdf).
If you view it in Acrobat you'll note there is no white space around the plot. If you use the above command line you'll find the plot is only about 1/3 of the space.
When doing the same thing with an EPS file I run into the same problem. However, there is the command-line parameter -dEPSCrop that one can pass to get the PS rendering engine to pay attention to the BoundingBox.
I need the similar argument for rendering PDFs. I was not able to find it in docs (nor even the -dEPSCrop, actually).
I had exactly the same issue. I fixed it by adding -dUseArtBox switch.
Example:
/usr/bin/gs -dUseArtBox -dNOPAUSE -sDEVICE=pngalpha -sOutputFile=output.png input.pdf
Note: -dUseArtBox switch is supported since ghostscript version 9.07
-dUseArtBox
Sets the page size to the ArtBox rather than the MediaBox. The art box defines the extent of the page's meaningful content (including potential white space) as intended by the page's creator. The art box is likely to be the smallest box. It can be useful when one wants to crop the page as much as possible without losing the content.
There are various options to control which "media size" Ghostscript renders a given input:
-dPDFFitPage
-dUseTrimBox
-dUseCropBox
With PDFFitPage Ghostscript will render to the current page device size (usually the default page size).
With UseTrimBox it will use the TrimBox (and it will at the same time set the PageSize to that value).
With UseCropBox it will use the CropBox (and it will at the same time set the PageSize to that value).
By default (give no parameter), Ghostscript will render using the MediaBox.
For your example, it looks like adding "-dUseCropBox" will do the job you're expecting.
Note, you can additionally control the overall size of your output by using "-sPAPERSIZE" (select amongst all pre-defined values Ghostscript knows) or (for more flexibility) use "-dDEVICEWIDTHPOINTS=NNN -dDEVICEHEIGHTPOINTS=NNN".
Have you tried using pdfcrop using pdftex (comes with texlive for example) or (not tried yet) the python script pdfcrop?
I have a similar workflow using the first tool mentioned.