Hi,
I'm using ghostscript to convert pdf of various format to png images. My pdfs are in landscape format or normal.
I'm passing to gs this command (from c#):
string CmdArguments = string.Format("-o {0}%04d.png -sDEVICE=pngalpha -r600 -g2000x2000 -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -c<</Orientation 3>> setpagedevice {1}", outputfilename, inputfilename);
But I have always on every page had undesired cut off of right border.
How can I fix this issue?
Many thanks :)
If you are expecting the page to be scaled to fit the specified fixed page size, then you need to tell Ghostscript to do so, which you haven't done.
By the way <> setpagedevice isn't valid, it would also be a lot easier to understand if you would quote an actual complete string rather than the parameters to a C# method, those of us who don't grok C# might be able to understand it better. You've put a '-c' in there to treat the following as PostScript, but there's no -f to terminate PostScript processing before you reach the input filename. Frankly I'm surprised this does anything at all.
Try adding -dPDFFitPage.
Related
In this PDF, the drawings on the second-to-last page apparently use a 0.00pt line width. This makes them almost unreadable on-screen, and completely invisible when printed.
Is there a relatively painless way to change these "no width" lines to have some width? There are lots of small details, so converting to image will not retain enough detail unless an outlandish resolution is used... then the "no width" issue re-emerges.
I've installed GhostScript, ran pdf2ps in.pdf med.ps then ps2pdf med.ps out.pdf and the line weights are exactly the same. Next, I opened med.ps in a text editor, hoping I could make a python script "find and replace" these zero line widths, but I'm seeing nothing like "0 w" in the file. Perhaps it is defined in a macro somewhere, but I'm not seeing it.
This idea came from Change the width of all lines in a PDF programmatically and Thicken line weights when printing PDF.
Best bet is to use a tool to decompress the PDF file (eg, using MuPDF; mutool -d <in.pdf> <out.pdf> or with Ghostscript gs -sDEVICE=pdfwrite -o out.pdf -dCompressPages=false in.pdf) then use a text editor or some kind of scripting tool such as sed to look for "0 w" and replace wiith 'something else'.
PDF isn't a programming language, unlike PostScript, so you can reliably search for operator usage like this in a PDF file, trying to do the same in a PostScript file is, as beginner6789 says above, extremely hard.
If you want to then have the finak file compressed you could run the edited file through Ghostscript's pdfwrite device using something like gs -sDEVICE=pdfwrite -o final.pdf in.pdf.
You absolutely should not use Ghostscript's ps2write device to producce PostScript; the PostScript imaging model is not entirely compatible with PDF, and any PDF constructs which cannot be represented in PostScript (such as any kind of transparency) will be rendered to an image. Really, don't do this.
This could be a problem if there are a lot of different weights used and you just want to change the 0.0 width lines. If they were all 0.0 then placing this early in the page could work unless the postscript looks in the system dictionaries for the command:
/setlinewidth {pop} def
The default linewidth for my ghostscript is 1.0 so that should be used automatically instead of the 0.0 linewidth.
The pdf2ps usually has a lot of pdf style dictionaries so finding the code used for setlinewidth can be confusing. The setlinewidth must be there someplace. Some people like to read postscript.
Pdf files aren't really meant to be edited so I use these options to make reading the final pdf easier: -dCompressPages=false -dCompressStreams=false just in case there is some useful information to look at in the pdf.
EDIT: depending on the code used to create the original postscript there might be labels like this:
dup/LW//knownget exec{
setlinewidth
}if
/w/setlinewidth load def
So there could be LW or w used for setlinewidth like this simple example. Most are not this simple.
EDIT2: There is some good info here:
How to change the width of lines in a PDF/PostScript file
We have ready created single page pdfs with trim and bleed boxes and greyscaled using an ICC profile. We are then using Ghostscript to combine into a multi-page pdf however after it has combined them the trim and bleed boxes disappear and the greyscale reverts to color. We can use the Ghostscript greyscale command but this doesn't help with the trim/bleed boxes which we need for imposition.
This is what we are using:
$command = 'gs -sDEVICE=pdfwrite -dNOPAUSE -dBATCH -dSAFER -sOutputFile="' . $outputPath . '" ' . implode(' ', $pdfFiles);
Be glad of any help or suggestions, we do a high volume so are currently using PDFTK to combine which keeps the boxes but doesn't fix the greyscale issue either.
You have not stated which operating system you are using, nor which version of Ghostscript, and you haven't supplied an example file.
The pdfwrite device goes to considerable effort not to alter the colour space or values of the input. If the input is in DeviceGray, then the output will be in DeviceGray, unless you specifically request a different space using the ColorConversionStrategy switch. What exactly do you mean by "the greyscale reverts to colour" ? The PDF displays differently ? Some other tool reports the file is 'colour' ?
There's really nothing anyone can suggest without a lot more information, in particular an example input file and ideally the file after you've run it through Ghostscript using the pdfwrite device.
Please note that Ghostscript's pdfwrite device does not 'combine' PDF files. The actual process is complex, and though the end result may appear to be the original files 'combined' that's not what is going on behind the scenes. The actual process is documented here.
Here have been already quite a few questions and answers about cropping documents with Ghostscript.
However, the answers are not matching my exact needs and are still confusing to me.
I expected that there would be a single option e.g. "-AutoCropToBBox" or something like this.
For clarification, as a bounding box, I understand the smallest rectangular box which contains all (non-white(?)) printed objects completely.
Furthermore, I want/have to use a printer port redirection (RedMon) to generate a cropped PDF via printing to a Postscript-printer from basically any application.
So, under Win7/64bit, I set the redirected port properties:
Redirected port properties Win7/64bit
The output is redirected to
C:\Windows\system32\cmd.exe
The arguments for the program are:
/c gswin64c.exe -sDEVICE=pdfwrite -o -sOutputFile="%1".pdf -
"%1" contains the user input for filename. With this, I get a full-page PDF. Fine!
But how to add the cropping options?
Additional question:
If I have a multipage document will such an (auto-)cropping be individual for each page? Or would there be an option to keep it all the same e.g. like the first page or like the largest bounding box of all pages?
Another related issue:
the window for prompting for the filename is always popping up behind the application I am printing from. Any ideas to always bring it to the front?
Another question:
There is the Perl-script "ps2eps" and program bbox.exe (see http://ctan.org/pkg/ps2eps). It's said there that Ghostscript (or ps2epsi) is occationally(?) calculating wrong bounding boxes. Is this (still) true?
Thanks for your help.
Well your first problem is that PostScript programs are normally written to expect to be rendered to a specific media size, and are usually not tightly bounded to it. White space is important for readability.
So ordinarily the PostScript program you generate will request a specific media size, and the interpreter will do its best to match that. If it can't match it then it will use a strategy to try and get as close as possible, and scale the entire content to fit that media.
You can't expect the printer to perform any of those things if it doesn't know the required size until its finished, and you can't be certain of the bounding box until you have rendered all the marking content. It is true that some files generally EPS files have a %%BoundingBox comment but.. that's a comment, it has no effect in PostScript, its there for the benefit of applications which don't want to interpret the PostScript.
So that's why the simple switch you want isn't there, it would break the interpreter's normal functioning, for rendering.
So, the first thing you need to do is determine the bounding box of the content. You can do that, as Stefan says, by using the bbox device. And on that note, as far as I know the bbox device produces accurate output. If it does not then we would appreciate a bug report proving it so we can fix it. If people don't report bugs how are we supposed to know about them ? Its disappointing to see someone spreading FUD instead of helping out with a bug report.......
ps2epsi isn't Ghostscript, its a crappy cheap and cheerful script, I wouldn't use it. However..... If the original PostScript leaves stuff on the stack then it will end up as a corrupted (or invalid) EPS file and the original PostScript should be fixed before trying to use it as it will break any PostScript program that tries to use it (eg if you include the EPS in a docuemnt and then print it).
So if you are using Ghostscript, and you want to take a PostScript program and get an EPS out of it, use the eps2write device. It won't have a preview bu frankly who cares.
Now if I remember correctly the bbox device (and eps2write) record all marking operations, you can't simply record all the non-white marking operations; what if the white overwrites an existing mark on the page ? What if the media is not white ? Note that if you render to a PNG with Ghostscript, the untouched portion of the output is transparent, whereas white marks are not.
So the bbox is the extent of all the marking operations, regardless of the colour. The only other way to proceed would be to render the content and count the non-white pixels. But that only works at a specific resolution, change the resolution and the precise bounding box may change as well.
Once you have the Bounding Box you can tell Ghostscript to use media that size. Note that you will almost certainly also have to translate the origin, as its unlikely that the content will start tightly at the bottom left corner. You will need -dDEVICEWIDTHPOINTS and -dDEVICEHEIGHTPOINTS to set the media size, and you will need to use -c and -f to send PostScript to alter the origin appropriately. In simple cases an '-x -y translate' will suffice but if the program executes initgraphics you will instead have to set a BeginPage procedure to alter the initial CTM.
If you set the media size with -dDEVICEWIDTHPOINTS etc then all pages will be the same size. If you don't want that then you need to write a BeginPage procedure to resize each page individually (you will also need to hook setpagedevice and remove the /PageSize entries from the dictionary.
I've no idea why Windows is putting the dialog box behind the active Window, it seems to have started doing that with Windows 7 (or possibly Vista). I don't see any way to alter that because I'm not sure what is generating the dialog.....
Personally I would suggest that you try the 2-step approach of running the original through Ghostscript's eps2write device and then take the EPS and create a PDF file using the pdfwrite device and the -dEPSCrop switch. Double converting is bad, but other solutions are worse. Note that EPS files cannot be multi-page, so you will have to create 'n' EPS files from an n-page PostScript program, and then supply a command line listing each EPS file as input to the pdfwrite device.
Take an example file and try this out from the command line before you try scripting it.
As I understood from #KenS explanations:
the way eps2write works, it may not or will not or actually cannot result in the minimum possible bounding box
it needs to be a 2-step process via -sDEVICE=bbox
So, I now ended up with the following process to "print" a PDF with a correct minimum possible bounding box:
Redirected Printer Port to cmd.exe
C:\Windows\system32\cmd.exe
Arguments for the program:
/c gswin64c.exe -q -o "%1".ps -sDEVICE=ps2write - && gswin64c.exe -q -dBATCH -dNOPAUSE -sDEVICE=bbox -dLastPage=1 "%1".ps 2>&1 >nul | perl.exe C:\myFiles\CropPS2PDF.pl "%1"
Unfortunately, it requires a little Perl script (let's call it: CropPS2PDF.pl):
#!usr/bin/perl -w
use strict;
my $FileName = $ARGV[0];
$/ = undef;
my $Crop = <STDIN>;
$Crop =~ /%%BoundingBox: (\d+) (\d+) (\d+) (\d+)/s; # get the bbox coordinates
my ($llx, $lly, $urx, $ury) = ($1, $2, $3, $4);
print "\n$FileName: $llx, $lly, $urx, $ury \n"; # print just to check
my $Command = qq{gswin64c.exe -q -o $FileName.pdf -sDEVICE=pdfwrite -c "[/CropBox [$llx $lly $urx $ury]" -c " /PAGE pdfmark" -f $FileName.ps};
print $Command; # print just to check
system($Command); # execute command
It seems to work... :-)
Improvements are welcome.
My questions are still:
Can this be done somehow without Perl? Just Win7, cmd.exe and Ghostscript?
Is there maybe a way without writing the PS-File to disk which I do not need? Of course, I could also delete it afterwards with the Perl-script.
I'm trying to convert a PDF to PDF/A. At every pass I'm getting the error "GPL Ghostscript 9.19: Annotation set to non-printing, not permitted in PDF/A, reverting to normal PDF output".
The PDF has previously been generated from HTML by wkhtmltopdf. With the error being quite vague I've done some research around PDF annotations. I've confirmed the PDF has no annotations, flattening annotations (though there isn't one) hasn't worked, I tried the -dShowAnnots=false switch. All to no avail. I've also tried it with a variety of different PDFs and I'm getting the same error on them all.
The command I'm using to do the conversion is "gs -dPDFA=2 -dNOOUTERSAVE -sProcessColorModel=DeviceRGB -sDEVICE=pdfwrite -o output.pdf /Users/work/Documents/Projects/pdf-generator-service-tests/PDFA_def.ps -dPDFACompatibilityPolicy=1 input.pdf"
I tried creating a basic PDF page from Google's homepage using wkhtmltopdf https://google.com putput.pdf and again, no joy (this is an example of the PDFs I've tried to convert, for people who may want to try and replicate the issue).
I thought the error was quite specific; PDF/A does not permit annotations to be set to non-printing. You haven't included an actual example of the kind of file causing you a problem, so I can't possibly comment on the presence of any annotations, but I assure you that its not possible to get this message without having annotations.
Since you've already set PDFACompatibility to 1 there's not much else I can say. You could open a bug report and attach the file there, or post a link to one here. Without that I can't say much.
Oh and you don't say which version of Ghostscript you are using, or where you sourced it from. Occasionally packagers break things so it might be worth trying to build from source.
One point; You execute the PDFA_def.ps file before setting PDFACompatibility=1, that's probably not going to work, you'll want to switch those two around. You should set the controls before you do any input or stuff might go awry, trying to change midstream isn't really a good idea.
I used gs (v9.21) to convert a PDF with annotations set to non-printing (hyperref) to a PDF/A compliant file. Annotations will not be present in the output file but, in my case, that was not an issue.
The command I used is:
gs -dPDFA=2 -dBATCH -dNOPAUSE -dPDFACompatibilityPolicy=1 -dUseCIEColor -sProcessColorModel=DeviceGray -sDEVICE=pdfwrite -sOutputFile=output_file.pdf input_file.pdf
Notes:
-dPDFACompatibilityPolicy=1 instead of -sPDFACompatibilityPolicy=1. The latter does not force gs to elide the annotation while the former does.
I used -dUseCIEColor because pdfa validation (https://www.pdf-online.com/osa/validate.aspx) failed with an issue related to the color space. This parameter is deprecated but I did not find any other way around this issue. For more details, see Convert PS files to PDF/A via Ghostscript, color space problems
Like KenS said, it's hard to know anything without a PDF to look at but since you're having trouble with the Google home page converted to PDF, I suspect that it's the external links that are causing the problem. Links are annotations and in PDF/A, external links are not permitted. Any link in HTML when converted to PDF will be considered external.
Gentlepeople,
I'm using the command line version of GhostScript for Windows to convert PDF to PNG images. However I noticed that also the annotations (such as comments, shapes, attached files - anything the user can put on top of the original PDF) were converted and appear in the image output. Is there any way to let Ghostscript ignore comments in PDF?
Your help is appreciated :-)
I had the same question. I found a setting in GhostScript which turns off comment printing (called annotations in their documentation). http://www.ghostscript.com/doc/current/Use.htm
the switch is -dShowAnnots=false which is case sensitive. For example, to convert a file to PNG (which was also what I wanted to do), you would use something like:
gswin64c -sDEVICE=png16m -sOutputFile="OutFile.png" -r300 -dShowAnnots=false "InputFile.pdf"
Using this command line format gave me exactly what I wanted: The first page of the source PDF converted to true-color PNG format without transparency, at 300 DPI, without any of the comments from the PDF.
Had this error:
BBox has zero width or height, which is not allowed.
Found this hint, but without solution: https://bugs.ghostscript.com/show_bug.cgi?id=696889
I already used
-dPreserveAnnots=false
but the error came nonetheless.
-dShowAnnots=false fixes it for me.