Error when opening PhantomJS generated PDF in Adobe Acrobat Reader DS - pdf

I am generating a PDF using PhantomJS, and it opens fine with Macs built in Preview, Google Docs, and a few other tools that I tested it on. However, when I open it using Adobe Acrobat Reader DC version 15.010.20056, I receive one of the most unhelpful messages of all time.
After this, my PDF is only partially generated. This happens both on PCs and Macs. I have no idea how to debug this or even start to figure out the cause.

In case anyone was wondering, PhantomJS doesn't properly render Tiling Patterns, setting up offsets X and Y offsets to be 0, which is actually not proper PDF specifications. This is one of the many reasons that PhantomJS renders things differently depending on how you open the generated PDF.

Related

pdfbox embedding subset font for annotations - part 2

I am creating a separate question, stemming from this one. The used code is almost the same. The reason is that the original problem was about subsetting a font with pdfbox, which I kind of dealt with. I got faced though with another problem, which is : the annotations, and how the fonts used in them are interpreted by particularly Acrobat Reader DC.
I tried different combinations of fonts and embedding options and got rather desperate. The fact is that I had a feeling that in particular the way these things are handled by the programs that interpret the PDF files is non-standard. I think I read somewhere that the annotations and the way they are displayed is on purpose non-standardized by the PDF format, to give freedom to the interpreters to handle them in their own way, since the main purpose of the annotations is the interaction with the user. TL;DR I cannot understand why Acrobat Reader DC doesn't like the annotations I have created and saved with PDFBOX. I even opened a question on friendly and helpful Adobe's User Community forum. But as I expected, someone suggested me to better investigate this question with the PDFBOX team.
Everything is possible, but rather than writing a question on PDFBOX mailing list (I could never get used or understand the efficient use of the mailing lists btw), I want to open a question here because I hope that it could help others to understand the PDF format better.
I basically rephrase the above question from the Adobe's forums here: Here is an example (Google Drive link) with FreeText annotations (but it seems to make no difference if I use Stamp annotations instead), it causes problems when open by Adobe Acrobat Reader DC (file) version 21.001.20149.37945 (I think this corresponds to April 16th '21 update). Specifically the problem happens when the Comments pane is opened by the user, either manually or automatically.
Manually:
link
Automatically:
link
While experimenting, I also tried to unset the "Use local fonts" option in Preferences -> Page Display. I had the impression that maybe Acrobat Reader will be more eager to show the error message once it is not allowed to substitute the erroneously embedded fonts with the possible local fonts. I am not sure if this is true.
The error that I get is the infamous "Cannot extract the embedded font XXXXXX+SomeFontName" as seen in the below picture:
link
The same problems happen also if I use full font embed (subsetting option set to false when using PDType0Font.load). I also tried to embed OpenSans font instead of LiberationSans, also tried to manually convert LiberationSans to a TTF font with fewer glyphs using FontForge, even tried to use Windows ARIALN.TTF, thinking that maybe the font is the problem. All cause the same behavior in Acrobat Reader DC. I have also tried to run Acrobat Reader 2019 Pro Preflight on the document and in the profile that scans the document for the possible font inconsistencies, it reports no errors.
Of course, when I use e.g. PDType1Font.HELVETICA instead of custom TTF font, I do not get the above errors. But I cannot use it because it does not contain the glyphs for the Unicode characters that I use. Does anybody have a better idea?
Thank you very much!
EDIT: to make myself clear - the error does not appear ALWAYS. it appears on some machines constantly (e.g. I am using Windows 7 64-bit with latest Acrobat Reader DC installed to reproduce it fairly well), while on my Windows 10 64-bit with the same version of Acrobat Reader DC it sometimes appears, and sometimes not - I haven't figured out why or in what cases.. - which makes me think - but no - I checked that too - the font I am using opens up alright on the machine where the problem is fairly constant)
UPDATE: at my wits ends again, I created a blank page with Apache OpenOffice, exported it to PDF, opened it with Acrobat Reader DC (last version), added a FreeTextTypewriter annotation (View -> Tools -> Comment -> Open) with 4 greek letters in ArialNarrow font, saved it, reopened it with Acrobat Reader DC, and it gives me the same error (cannot extract the embedded font...).. So this could be the Reader problem? But they made this so difficult to diagnose.. Here is the file, but I do not expect it to show errors on other machines. It's one of those moments that you start to believe in magic and the power of prayer (and a good sleep)
UPDATE 30/04/2021
So, to sum things up, I haven't come with a solution yet, but I came up with three files created with PDFBOX, OpenPDF (iText5 fork) and Acrobat Reader DC itself (can append annotations and save - just adding a simple Text box with greek text through Comment pane) - and they all issue the above error message, when open by Acrobat Reader DC. I have posted details in the Acrboat Reader forum here (same link as in comment)
I have added the code that I used to create the OpenPDF example file here and the example 3 files are in the same repository here

Issue with ghostscript rendering PPT into PDF

I've been tinkering with Ghostscript with a port monitor(on a HP PCL 6 Universal driver) to convert print job into PDF. I've tested with a few applications such as Words, Excel, Adobe Reader, Microsoft Edge etc and they are all working properly.
However upon testing Microsoft Powerpoint 2016, it seems like there are some graphics that are unable to be rendered properly through Ghostscript.
Actual Slide Below
Output From Ghostscript in PDF Below
I've tested this even with some other PDF generators such as BioPDF,CutePDF as well as AdobePDF and they would all result in the same output as above.
Just wondering has anyone tried and have faced similar issues before? if so could someone point me in the right direction??
What you are doing isn't a single step PowerPoint to PDF and Ghostscript is not rendering the PowerPoint. In fact if you are creating a PDF file Ghostscript isn't (ideally) rendering anything.
What's actually happening is that you are asking PowerPoint to print to a canvas, which is then passed to the PostScript printer driver. That produces PostScript which is sent to the Port. Your (and others) Port Monitor then sends the PostScript to the 'Distiller' (in your case Ghostscript and the pdfwrite device). The Distiller reformats the vector drawing commands into a PDF format and builds a PDF file from them. It doesn't render (turn into a bitmap image) anything unless forced to.
Obviously there are several places along that road where the problem could creep in. Given that you say that the Adobe product (the others you mention al use Ghostscript) has the same problem, I think its safe to assume that the problem isn't Ghostscript.
This also means that you aren't using the driver you think you are. Adobe can't handle PCL as an input medium as far as I'm aware, and nor can Ghostscript. GhostPCL will handle PCL as an input, but that's not what you say you are using.
Of course you haven't linked to an example file to demonstrate the problem, nor supplied an example command line, so this is all supposition.
Now if, somehow, you are using a PCL6 device, then the problem is most likely due to the presence of rasterOps in the output. Rasterops are part of the PCL imaging model which do not exist in PDF and are a form of transparency. There are three ways to handle such content for a PDF output device; firstly render the whole page content to an image, secondly ignore the rasterOps objects, thirdly treat the rasterOps as opaque.
GhostPCL and the pdfwrite device take the third option. So, its just conceivable that your original content has some transparent objects which are being handled as rasterOps by the PCL printer driver, and then rendered as opaque by GhostPCL and the pdfwrite device.
If that's somehow the case then the solution is simple; don't use a PCL printer driver, use the PostScript one.
If you post a link to a (simple, eg single page) example of what you are sending to Ghostscript, and a command line, then I can look at it. Please don't send me the PowerPoint, I can't use it and even if I could, my print setup would not match yours. I need the data being sent to Ghostscript.
[EDIT after looking at files]
Don't mean to sound like I'm lecturing, the problem is people find these result on Google searches and then try to apply them based on a poor understanding of what's happening. So I find it best to be really clear in my answers about what's going on. It saves questions later :-)
The first thing I see is that the PCL is indeed PCL, and if you try running that through Ghostscript it throws horrible errors and exits. So presumably you aren't doing that.
The PostScript file contains nothing except huge images, rendered (presumably at 600 dpi) contains 2 pages, the two pages look like your images above. Which is why the PostScript is better than 20 times larger than the PCL file.
But.... If I open the .ppt file with OpenOffice (4.0.0 is what I have to hand) I see exactly the same thing. I don't, I'm afraid, have a copy of Microsoft PowerPoint, but from what I see here there are two conclusions;
firstly that the PDF I get looks pretty much like the PowerPoint when viewed with OpenOffice at least. So there's something 'interesting' about your PowerPoint.
secondly, even if that's not what you expect, its what's in the PostScript program. That means that either PowerPoint rendered the slide to a bitmap or the Windows printing system/HP driver did.
Now, if I run the PCL through GhostPCL instead of Ghostscript (rendering, not producing a PDF) then the result is more like what I think you are expecting. However, when sent to a PDF file the result is horrible. Which strongly suggests to me that there's some form of transparency involved, PostScript doesn't support transparency at all, and PCL does it through rasterOPs.
I'm afraid that this means that the problem lies either in PowerPoint, the Windows print system or the PostScript printer driver you are using. Since the PCL is at least close to what you expect, I suspect that this means PowerPoint is doing the right thing, and its the printer driver messing up. It appears you are using the Windows PostScript printer driver.
So there's no way you can 'fix' this for files like this, at least not with Ghostscript. You would need to 'fix' the Windows PostScript printer driver, or possibly the Windows print system. You could try reporting a bug to Microsoft, presumably these files print incorrectly when sent to physical PostScript printers too.

Removing screen reader data in PDF so Microsoft Edge can open it?

I am hit with a Microsoft Edge bug that has been around for a long time, and doesn't seem to get any attention: Microsoft Edge doesn’t open some PDF files if they have data for screen readers
I have an application that generates a PDF, which is then printed. To support Microsoft Edge and workaround the bug, I am thinking to open and strip out any data that gives Edge trouble using PDFBox. However, the issue is slim on details, and I can't find any info on what specifically triggers the problem for Edge. Does anyone have experience with this and can suggest what specifically I should be stripping out to make a PDF open in Edge?
[Edit]: Just to add, currently if I download the PDF and open the PDF in Edge, it still wouldn't open even though if I open the same local PDF in Chrome, IE11 or Firefox, it works fine.
The file has some weirdness, if you open it with NOTEPAD++ it will show that there is some data ater %%EOF. Anyway, try this code, which removes some unneeded stuff.
PDDocument doc = PDDocument.load(new File("myfile.pdf"));
PDDocumentCatalog cat = doc.getDocumentCatalog();
cat.getCOSObject().removeItem(COSName.PAGE_MODE);
cat.getCOSObject().removeItem(COSName.VIEWER_PREFERENCES);
PDPageTree pageTree = cat.getPages();
pageTree.getCOSObject().removeItem(COSName.PARENT);
doc.save("myfile2.pdf");
It is possible that the three "removeItem" calls are not needed, or only some of them, I can't test it myself.
If it still doesn't work, please ping me again and I'll try another idea (setting the mediabox at the page level).

Text is misaligned in PDF with fillable field only in Safari

I have a PDF that is generated with PDFlib that has fillable fields. They work as expected on Chrome, FF, Edge, IE, but not on Safari.
We have no Macs at work, so I'm looking at this using SauceLabs with OSX El Capitan, Safari 9.3 on 1376x1032 resolution.
When I fill out the fillable field on a form such as https://www.pdflib.com/pdflib-cookbook/pdf-on-the-web-server/starter-webform/php-starter-webform/ in-browser (again, Safari only), when you click outside the fillable field, the text moves up a couple of pixels. Click to edit again and it regains its original position. Click outside and it moves up again.
Is this an unavoidable Safari bug or is there anything I can do to prevent this from happening (currently reading the PDFlib documentation, and I have seen nothing that remotely mentions this)?
They work as expected on Chrome, FF, Edge, IE, but not on Safari.
please get in mind, all browsers have in the moment only limited PDF viewing support. (they are getting better and better with each versions, but they are not yet perfect)
Please use for viewing PDF files the PDF reference product Adobe Acrobat (Reader). Also other PDF viewers like the Apple Preview are not feature complete and might not display everything correct.
When it display correctly in Adobe Acrobat Reader, you know the PDF is fine and it's a bug in the viewer you use.

In search of a lightweight pdf viewer

I am looking for a lightweight pdf viewer ( commercial / free) for my windows application.
I presently display the pdf documents on a webbrowser with Adobe Reader Plug-ins.
Background :
The problem i am having with Adobe Reader is the Loading time. To display a pdf document for the very first time, Adobe Reader nearly takes 15 seconds !! .The application when deployed on customer locations (usually run on Windows Embedded OS) the pdf viewing time is still worse, sometimes takes more than a minute.
Hence i need to find an alternative for Adobe Reader.
My simple requirements are :
Lightweight - viewer should initialise itself and load the pdf as
fast as possible.
SVG support.
If anyone has any idea regarding such a tool. Kindly let me know
Regards
Srivatsa
Try : Foxit PDF SDK
Try SumatraPDF (Download full kit for MOZ plugin npPDFViewer.dll sorry there is NO IE.OCX)
For a minimal install use with the portable executable in same directory and you can call via DDE or command line
I think best light weight option for Windows is MuPDF for those who would rather not use any plugin in the Chrome.
http://mupdf.com/