PDF to PostScript Using Ghostscript: large files having issues printing

PDF to PostScript Using Ghostscript: large files having issues printing - pdf

I'm currently using Ghostscript to convert 500 page PDF files into PostScript.
I'm using Windows 7, Ghostscript x64 v 9.16, and a Kodak Digimaster Commercial Printer.
I use the following arguments for GhostScript to convert a PDF into PS:
C:\Program Files\gs\gs9.16\bin\gswin64c.exe"
-dCompressFonts=true
-dSubsetFonts=true
-dEmbedAllFonts=true
-sFONTPATH=C:\Windows\Fonts\
-dNOPAUSE
-dBATCH
-sDEVICE=ps2write
-sOutputFile="PostScript.ps"
"MyPdf.pdf"
I then add %KDK (proprietary) commands to dictate which pages need to print on which paper by using the %KDKSlip command based on the Printer documentation.
The example below would print all pages on Letter duplex except for pages 1/2 and 5/6. Pages 1/2 would print on a paper defined under the name of "YellowPerf", while 5/6 would print on "TriPerf":
%!PS-Adobe-3.0
%%BoundingBox: 0 0 612 792
%%HiResBoundingBox: 0 0 612.00 792.00
%%Creator: GPL Ghostscript 916 (ps2write)
%%LanguageLevel: 2
%%CreationDate: D:20150506143059-05'00'
%%Pages: 8
%%DocumentMedia: Letter 612 792 0 white ()
%%+ YellowPerf 612 792 0 yellow ()
%%+ TriPerf 612 792 0 white ()
%KDKRequirements: duplex
%KDKSlip: YellowPerf duplex 1
%%+ TriPerf duplex 5
%%EndComments
%%BeginProlog
This is then sent to a Kodak Digimaster printer using a Windows command:
> COPY PostScript.ps PrinterName
This has worked fine with smaller documents, but I'm having issues with larger page sets.
When I attempted to print to the Digimaster using a 500 page PDF to Postscript file, it had errors occur: "Busy, do not reset the RIP".
File size of those that didn't work:
PostScript File Size: 52 MB
PDF File Size: 41 MB
File sizes of those that did work:
PostScript File Size 1MB
PDF File Size: .8 MB
Why does this work fine with smaller files but get hosed on larger files?
Would anyone have any advice?

It is not necessarily the filesize of the PostScript that causes your problem:
It could be the PostScript itself, or
it could be that you made a mistake with your editing of the PS files when you inserted the (proprietary) %KDK-comments.
Are you sure your text editor doesn't silently change your linefeed characters?! This could also change the binary parts of the PostScript!
Also, I'm not sure if the copy command does handle print jobs like it should. I would prefer the lpr command (ah... is that even still available on your version of Windows?!)
To debug this and to explore a few different roads to successful printing, I would try a few different steps:
To debug
Send the original PostScript, without the added %%KDK DSC header comments, to the printer.
That printer model has a nice feature you can utilize: you can check if its RIP processes the input file completely and successfully without needing to output your 500 pages on (wrong) paper and waste it therefore (you'd also need to discard it afterwards -- too much work too). Just click the red "Stop" button on its user interface monitor.
Does that one complete the RIP process successfully?
Yes? Now you can now even print it. Before you do so you can even modify the job settings to select a particular paper tray, by clicking on some button on the interface (can't recall the exact button label though). Then "release" the job and it will print.
If it worked, you can again turn your attention to get your %%KDK lines right.
If it didn't you have to try another route.
Check if a different PDF-to-PS converter is working
Create a PostScript file with the help of pdftops (see here for the pdftops.exe version -- read the README to see which options are available).
Proceed analog to above: first see if it completes the RIP process. Then continue with your %KDK manipulations....
Check if the direct PDF printing is working
The Digimaster model can consume PDF directly. (Well, internally it uses its own PDF-to-PS converter, but that isn't visible to the outside -- so it doesn't really count as a PDF RIP...)
If that works, you can even prepend your appropriate %KDK comments to the PDF file, similar to the lines below (don't rely on me getting the details right, it's from the top of my head, and memory is decades old!):
%!PS-Adobe-3.0
%%.........................
%%DocumentMedia: ..........
%KDKRequirements: .........
%KDKInserts: ..............
%KDKSlip: .................
%KDKBody: .................
%KDKCovers: ...............
%KDKPDFPrintAnnotations: on
%KDKPDFFitToPage: on
%KDKBinaryOK: on
<esc>%-12345X
%%Emulation: pdf
%PDF-1.5
%...here follow the lines of the original PDF file...
...
Send jobs via "Kodak Printfile Downloader" (KPD)
For Windows there used to be the so-called 'Kodak Print File Downloader' (KPD). The KPD is an application, not a printer driver. Not sure if it is still available.
You could open its GUI, then load a PS, PDF, PCL or TIFF file into its to-be-printed-list of jobs. Then select a few job options (like trays, stapling, sorting etc.). Lastly, send the job off to the Digimaster...
The KPD essentially does the same thing, as you want to achieve: insert %KDK commands into the file header. But you want to do it with a script or an editor (and possibly automatically via a batch process, once it works).
The KPD requires interactive user activity and cannot be scripted.
But you can (ab-)use it to intercept the files it creates from the Windows spooling system, study them and then adapt your scripted efforts so that they also work....
Update
(I had wanted to add this already in my initial answer. But time ran out, so I skipped it for the time being..)
Observe the RIP processing directly at the printer UI
Digimaster printers have their own built-in touchscreen or flatscreen or tube monitor (depending on the age of the model). They also typically have a full-time operator who knows the machine and its tweaks and peculiarities quite will. The machine may be quite a distance from the user sending a job.
So the following should be done when debugging a print problem:
Ask the operator to set the printer to "stop printing", but still "receiving new jobs".
Submit any job(s) you want.
Walk up to the printer and its operator.
Release the job for RIP-ping and observe what happens:
You may see everything going alright and completing until the last page (you know how many pages you submitted, right?)
Or you may see the job aborting at a certain page number.
Or you may see the printer RIP chewing extremely long on a certain page (or several pages), but finally completing the job.
Or you may see the printer RIP hanging with a certain page forever.
Or...
In any case, the details which are observable here may give important clues about where to look next...

Related

Converting Windows .PRN file (PCL) to PDF

I have been succesfully capturing PCL content sent by old machinery to a parallel port and converting it to PDF using GhostPCL for a while.
However, we have some older industrial machinery which is based on Windows 2000 and outputs to a HP Laserjet printer via the parallel port. Unfortunately, the software on the machine does not allow additional software or printers to be installed.
The problem is that whilst the captured output appears to be PCL graphic data, I have not found any tools which can convert it - GhostPCL attempts, and you can make out the text a little, but it is completely corrupted.
The captured output results in the output from GhostPCL
I can see that the captured output starts with:
ESC E (PCL command for Reset)
ESC &l0L (PCL command to disable skip perforation)
ESC &r1U (*** UNKNOWN ***)
ESC &l1H (PCL command to Feed from tray 2)
ESC *o0M (*** UNKNOWN ***)
ESC &126A (PCL command for A4 portrait paper)
ESC *g8W (PCL command to configure raster data - 8 bytes)
I can see that the captured output has some PCL codes which do not appear in the official documentation, which results in the weird characters at the start of the PDF.
Does anyone know how to convert this file to PDF ?

Your description shows an attempt to read a language different to pcl, so was the older system designed to talk to a default printer using Epson ESCAPE encoding. so its a dot printing file that the hardware will position those dots in the correct place and pressure, you could try converting to a bmp then massage the image into a pdf page.
You say an HP is connected but is that the true capture of what it is agreeing to use at runtime?
For example If I attempt to save my HP inkjet print file at time of printing I will get a PDF ! but why since the printer normally cant handle those direct?
What may I find If I look and see the default printing language is not set to a PCL or a PJL one.
There are many pcl language variations so a PRN file ideally should have some compatibility declaration such as #PJL ENTER LANGUAGE = PCLXL ) HP-PCL XL;2; Note this is NOT an HP printer just one that declares the following code will be HP style.
The printer can accept many formats and the system can produce many different formats for one printer. Thus you need to check all the system settings to understand what language is actually in run time use. Are you sure that .PRN is a full load of the conversation between the system & printer as print to file is not always the expected 2 way code.
The best way to ensure you capture true printer driver output is to change the drivers output PORT to a fixed filename and ideally use the correct format extension NOT unspecified .prn or .pcl if it is not such.

Issue with ghostscript rendering PPT into PDF

I've been tinkering with Ghostscript with a port monitor(on a HP PCL 6 Universal driver) to convert print job into PDF. I've tested with a few applications such as Words, Excel, Adobe Reader, Microsoft Edge etc and they are all working properly.
However upon testing Microsoft Powerpoint 2016, it seems like there are some graphics that are unable to be rendered properly through Ghostscript.
Actual Slide Below
Output From Ghostscript in PDF Below
I've tested this even with some other PDF generators such as BioPDF,CutePDF as well as AdobePDF and they would all result in the same output as above.
Just wondering has anyone tried and have faced similar issues before? if so could someone point me in the right direction??

What you are doing isn't a single step PowerPoint to PDF and Ghostscript is not rendering the PowerPoint. In fact if you are creating a PDF file Ghostscript isn't (ideally) rendering anything.
What's actually happening is that you are asking PowerPoint to print to a canvas, which is then passed to the PostScript printer driver. That produces PostScript which is sent to the Port. Your (and others) Port Monitor then sends the PostScript to the 'Distiller' (in your case Ghostscript and the pdfwrite device). The Distiller reformats the vector drawing commands into a PDF format and builds a PDF file from them. It doesn't render (turn into a bitmap image) anything unless forced to.
Obviously there are several places along that road where the problem could creep in. Given that you say that the Adobe product (the others you mention al use Ghostscript) has the same problem, I think its safe to assume that the problem isn't Ghostscript.
This also means that you aren't using the driver you think you are. Adobe can't handle PCL as an input medium as far as I'm aware, and nor can Ghostscript. GhostPCL will handle PCL as an input, but that's not what you say you are using.
Of course you haven't linked to an example file to demonstrate the problem, nor supplied an example command line, so this is all supposition.
Now if, somehow, you are using a PCL6 device, then the problem is most likely due to the presence of rasterOps in the output. Rasterops are part of the PCL imaging model which do not exist in PDF and are a form of transparency. There are three ways to handle such content for a PDF output device; firstly render the whole page content to an image, secondly ignore the rasterOps objects, thirdly treat the rasterOps as opaque.
GhostPCL and the pdfwrite device take the third option. So, its just conceivable that your original content has some transparent objects which are being handled as rasterOps by the PCL printer driver, and then rendered as opaque by GhostPCL and the pdfwrite device.
If that's somehow the case then the solution is simple; don't use a PCL printer driver, use the PostScript one.
If you post a link to a (simple, eg single page) example of what you are sending to Ghostscript, and a command line, then I can look at it. Please don't send me the PowerPoint, I can't use it and even if I could, my print setup would not match yours. I need the data being sent to Ghostscript.
[EDIT after looking at files]
Don't mean to sound like I'm lecturing, the problem is people find these result on Google searches and then try to apply them based on a poor understanding of what's happening. So I find it best to be really clear in my answers about what's going on. It saves questions later :-)
The first thing I see is that the PCL is indeed PCL, and if you try running that through Ghostscript it throws horrible errors and exits. So presumably you aren't doing that.
The PostScript file contains nothing except huge images, rendered (presumably at 600 dpi) contains 2 pages, the two pages look like your images above. Which is why the PostScript is better than 20 times larger than the PCL file.
But.... If I open the .ppt file with OpenOffice (4.0.0 is what I have to hand) I see exactly the same thing. I don't, I'm afraid, have a copy of Microsoft PowerPoint, but from what I see here there are two conclusions;
firstly that the PDF I get looks pretty much like the PowerPoint when viewed with OpenOffice at least. So there's something 'interesting' about your PowerPoint.
secondly, even if that's not what you expect, its what's in the PostScript program. That means that either PowerPoint rendered the slide to a bitmap or the Windows printing system/HP driver did.
Now, if I run the PCL through GhostPCL instead of Ghostscript (rendering, not producing a PDF) then the result is more like what I think you are expecting. However, when sent to a PDF file the result is horrible. Which strongly suggests to me that there's some form of transparency involved, PostScript doesn't support transparency at all, and PCL does it through rasterOPs.
I'm afraid that this means that the problem lies either in PowerPoint, the Windows print system or the PostScript printer driver you are using. Since the PCL is at least close to what you expect, I suspect that this means PowerPoint is doing the right thing, and its the printer driver messing up. It appears you are using the Windows PostScript printer driver.
So there's no way you can 'fix' this for files like this, at least not with Ghostscript. You would need to 'fix' the Windows PostScript printer driver, or possibly the Windows print system. You could try reporting a bug to Microsoft, presumably these files print incorrectly when sent to physical PostScript printers too.

Adjusting format of PDF to print it faster

I am using a combination of iTextSharp and PdfSharp to assemble a large PDF file for printing to a Canon Oce VarioPrint 6000 series printer. The PDF is replacing a postscript file.
Both this new file and the old are transferred to the printer via an LPR command.
The postscript file would take maybe 10 minutes to rip to the printer. My PDF version of the same file is taking over 30 minutes to process before it is ready to print.
Can anyone give me pointers into ways I could change the way this file is written / created that would decrease the processing time on the Vario?
EDIT: I took the file that was ripping so slowly and ran it through Acrobat Preflight and it found many RGB images, that it wanted to convert to CMYK. When I look at the PDF though, they are all black and white logos, so I had Preflight do a fix up to convert all images to print Black and White.
I also noticed the Preflight was consolidating backgrounds. Half of the pages have the same logo on them, so leveraging this conversion is probably also helpful.
When I LPR'd that file, it copyed and ripped in less than 5 minutes! So I guess the real question is how can I do that programmatically?
I am modifying the title and tags.
Thanks!

An equivalent result to the preflight repair process in this case can be gotten by using iText (or in my case, iTextSharp). I replaced the PdfSharp method of aggregating the pdfs with the PdfSmartCopy class. This brought down the size of the outputted pdf significantly, combined with using iText's reader.RemoveUnusedObjects(), and my rip time to the printer was lowered to the same or below the previous rip times that we had with the postscript file. Very pleased.
So the RGB images that were probably contributing to the large processing time, were narrowed by the Smart copy removing duplicates.
More info on PdfSmartCopy can be found at: http://api.itextpdf.com/itext/com/itextpdf/text/pdf/PdfSmartCopy.html
and in Bruno's book, iText In Action, more specifically in Chapter 6.

Saving the output from DiffPDF / ComparePDF command line. - Comparing folders of PDF's

We have to do a comparison of about 1500 PDF's in one folder with 1500 PDF's in another to check for visual differences.
We have found DiffPDF(and comparePDF command line version) for Windows which is a lot faster than our automated Acrobat Pro comparisons.
So far I have used:
comparepdf -v=2 =c=a old.pdf new.pdf
but the problem with this is that it just returns "these files are different". Does anyone know of any way to save the output from command line? You can do this from the GUI but that would mean using something like TestCOmplete to automate it :(
Or are there better ways of doing a comparison of 2 PDF's visually- with output/highlighting/
Bonus points for C# .net libraries.

You could have a look at these answers to similar questions:
PDF compare on linux command line
How to compare two pdf files through command line
How to unit test a Python function that draws PDF graphics?
However, I have no idea if any of these would be performing faster than what your automated Acrobat Pro comparison does... Let me know if you found out, will you?
Shortcut:
For simplicity, let's assume your input files to be compared are similar enough, and each being only 1 page. (For multi-page input expand the base idea of this answer...)
The two most essential commands any such comparison boils down to are these:
compare.exe ^
%input1% ^
%input2% ^
-compose src ^
%output%.tmp.pdf
and
pdftk.exe ^
%output%.tmp.pdf ^
background %input1% ^
output %output%.pdf
The first command generates a PDF with all differential pixels colored in red. (A default resolution is used here, 72 dpi. For a more fine-grained view on pixel differences add -density 200 (that will mean: 200 dpi) or higher -- but your processing time will increase accordingly as will the disk space needed by the output...)
The second command tries to merge the resulting PDF with a background taken from ${input1}.
Optionally, you may add -verbose -debug coder after the compare command for a better idea about what's going on.
compare.exe is a commandline tool from the great, great ImageMagick family of utilities (available for Linux, Windows, Unix and MacOSX). But it requires a Ghostscript installation to use as a 'delegate' in order to be able to process PDF input. pdftk.exe is also a commandline utility, available for the same platforms. Both a Free Software.
After the first command, you'll have an output file which has only red pixels where there are differences found on the page.
After the second command, you'll have an output with all red 'diff' pixels in the context of the first input PDF.
Example output:
Here are screenshots of two 1-page PDF files with differences in their content:
Here are screenshots of the output produced by the two commands above:
The left one shows the intermediate result (after first command), with only the difference pixels displaying as red (identical pixels being white).
The screenshot on the right shows the red difference pixels, but this time with the input PDF file number 1 as a (gray) background (after second command).
(PDF input files courtesy of Mark Summerfield, author of the beautiful DiffPDF tool.)

I had the same problem, diffpdf is quick and nice but GUI only.
[comparepdf] is console one but reports only exit code (no diff itself).
[diff-pdf] has both console mode and diff.pdf output but it is slow and output is not friendly.
I have tried to add the required code to diffpdf,
you can find it here: http://github.com/taurus-forever/diffpdf-console

Fine-tuning ghostscript PDF to PS conversion

I have a program that generates a PDF as output. If I send this file to a printer using the Adobe viewer, it prints exactly as wanted. In particular, the application is printing labels and there's a requirement that every last pixel on the page is used, i.e. no margins whatsoever.
I'd like to try and automate this process. GhostScript seemed a logical choice. I used the command lines
gs -dBATCH -dNOPAUSE -sDEVICE=psmono -sOutputFile=A4_300.xxx -sPAPERSIZE=a4 A4_Print.pdf
... or alternatively
gs -dBATCH -dNOPAUSE -sDEVICE=ljetplus -sOutputFile=A4_300.xxx -sPAPERSIZE=a4 A4_Print.pdf
I can send the output file, A4_300.xxx, to the printer via LPR and it almost prints well, but there's about 6-8 mm missing on all sides, i.e. there's a margin being enforced, and the text that should be printing in that area is actually being cut off.
Paper size should be a4, and that much is working correctly. But how can I arrange for the output to fill the whole page?
The output device is "some kind of HP laser printer"; I haven't seen the physical device. A similar printer I tested with was able to process output both for "psmono" (that produced PostScript) and "ljetplus" (binary, but printable).
Any advice, please?

First of all: are you sure that your printer is physically able to print edge-to-edge? Which printer model is it?
It may well be that the printer itself enforces the "missing 6-8 mm on all sides". Since you see the margin "area actually being cut off", it means the printer indeed receives the complete image, but it crops the image to what appears as *ImageableArea keywords in PostScript printer PPDs (PS Printer Description files).
If your printer supports edge-to-edge printing indeed, then you may need to enable it as a default...
...by some semi-secret setting in the front panel menu (if your printer has s.th. like that), or...
...by accessing the web-based printer configuration panel from your computer's browser (should your printer support that), or...
...by logging into the printer via telnet, rsh, ssh or msh (depending on your printer to allow this).
The actual procedure to set this depends on your printer model. It should be described in the printer manual.
If you are unlucky, the device simply doesn't support borderless printing. Then buy or find a model that does what you want ;-)
Update: I had missed your statement "If I send this file to a printer using the Adobe viewer, it prints exactly as wanted." From this I conclude that your printer must indeed be supporting borderless printing.
If your LPR client uses any form of PPD (as is the case if you print via CUPS, f.e.), then check out my hints about modifying PPDs (which also works for Windows systems) here:
"What lpr arguments do I need to print a 1400x800 pixel image on a 4x6 label?"
"What's the easiest way to add custom page sizes to a PPD?"
Most likely you do not need to finetune your Ghostscript output; it is fine as the cropped printouts show.
Most likely you need to tweak your LPR client so that its "driver" does not destroy what you want to send to the printer.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas