We have this big web project where the user can print the html to pdf. We are using dompdf, and have somewhat fixed the long cell issues that cause the pdf to have several blank pages. Now the issue is that the saved pdf, when closing, always asks if the user wants to save changes. I have verified that the pdf has the proper %%EOF, and have checked for object consistency. What else could be causing this problem?
After reading this introduction to pdf I realized that if the pdf was modified, I had to accomodate all the object offsets so that they would point to the object start location.
Related
I've built my own .PDF carefully following Adobe PDF specifications. macOS has Preview and that opens and displays the PDF document properly, but Acrobat Reader reports an error, not specified, and does not display the document.
If I open it with Preview and export it from there as another PDF, the result is shown properly by both Preview and Acrobat Reader.
The Preview version is somewhat inexplicably complicated and I cannot determine what difference was done by Preview to allow Acrobat to handle it.
Nor have I been able to find software, even from Adobe's site, that will diagnose the problem.
I have attempted many reasonable variations, additions, etc. on my PDF, to no avail.
What's the secret formula to unlock the beast?
In response to the request, here's an abbreviated copy of the PDF file. All the controlling objects are present but I've deleted portions of the streams to spare you the scrolling. There are two fonts on one page which is an amalgam made of two separate source PDFs, one generated by me and the other from a different supplier, hence the preliminary /Contents objects setting up scaling preliminary to each page object.
Well, after placing the sample here, I get a reject message saying there was an error submitting the edit.
I'm going to try again after eliminating all the compression code in the streams and much of the uncompressed formatting code, to save you from all the scrolling.
(PDF code removed as requested. Stay tuned for update.
In a PDF with many pages, is it possible to have only a single page be corrupted?
I've done some digging and couldn't seem to find anything so I am not even sure if this is possible, wondering if anyone has knowledge about it. And if it is possible how could I go about reproducing this? I've done some experimenting with editing hex values but it always renders the whole pdf file corrupt.
A PDF is a complex object graph. A bit simplified you have
document
|
+ lots of objects with different purposes
+ page tree
|
+ .. some page
|
+ content stream ("page description language")
+ resources
Now, as #mkl and #Setasign mention there's a lot you can malform in the serialization format of this network.
In your concrete document the page reference, the content stream reference, the content stream content, a resource reference in the content, a resource content,... could be the reason for failure. To debug, you will need a copy of the PDF references, the invalid file and a good PDF parser / browser tool.
Recreating a failure by blindly hacking hex in the document will most probably fail because of:
the serialization format of the objects is indexed so adding/removing bytes in the middle of a page will not damage only the page, but the document.
blindly changing some value can do damage any (structural) content, not only page content. you must restrict to changing content of items that are bound to /Contents entries only. This is where again intimate knowledge of PDF is required.
In short: it's very unlikely that you can reproduce a page rendering error by hazardly changing/adding/removing hex in a PDF.
Its like you have had a runtime error in a program and now want to recreate the error in another program by adding / removing chars...
This is not a back-end programming question. I can only modify the markup or script (or the document itself). The reason I'm asking here is because all my searches for appropriate terms inevitably lead to questions and solutions about programming this functionality. I'm not trying to force it via progrmaming; I have to find out why this PDF is behaving differently.
So:
I have a bunch of links to PDFs on a page. Most of them open in new tabs, but one of them, the most recent, starts to open in a tab, but then the tab closes and the PDF gets downloaded as a file instead. All markup is consistent - there's nothing differnt about the odd-man-out except the actual URL.
You can see this here:
http://calwater.mwnewsroom.com/Investor-Relations/Financial-Reports/Annual-Reports
All annual reports up to 2012 open in a new tab, but 2013 downloads instead.
This leads me to believe that there is some meta-data property of the PDF itself that tells it how to open, and that, in this case, the 2013 PDF was created using different settings.
Apparently, the PDF was saved out to PDF from InDesign.
Does anyone have any insight?
Problem solved. There was simply an error in the string (like an extra period) that references the attachment such that it couldn't tell it was a PDF. Fixing the reference fixed the problem.
I am writing a Word 2007 document with a lot of images that are sure to change before the document is delivered. Therefore, I am inserting them in the document as links to PNG files. My problem is that if I select the image and execute:
Selection.InlineShapes(1).LinkFormat.Update
MsgBox Selection.InlineShapes(1).LinkFormat is Nothing
the message box displays "True". That is, the Update method broke the link.
I have tried using Selection.InlineShapes(1).Delete, followed by Selection.InlineShapes.AddPicture. This updates the image, but now I need to crop the image and that introduces its own set of problems. Before trying to deal with the cropping issues, I'm hoping that someone has a better way of updating the linked file.
BTW, closing the document and reopening it updates the image nicely as long as the filename has not changed. The point of the macro is to cope with filename changes, if necessary.
Situation:
(Large) PDFs are stored on an iOS device
The PDFs are encrypted using a Rijndahl algorithm
When tapping one of the PDFs, it gets decrypted and afterwards viewed using a PDF viewer I implemented. The viewer is using the Core Graphics functionality to render the document page by page.
Issue:
With the documents being large enough, encrytion will take a while.
Viewing can only be started after the whole document has been decrypted into a temp file.
I'm wondering, if there is a way to...
Pass some kind of stream to CGPDFDocument instead of a file URL
Or any other alternative to be able to view as many pages as possible whil decrpytion is continued in the background?
If you cannot split your original PDF files down to single pages (as I suspect), then the following approach should work:
A: When still decrypting:
try to open the PDF document as you already do;
try accessing the document page you are interested in;
if it does not fail, render the page;
if it fails, then you know that page is not available yet (while decrypting);
while decrypting, release the pdf document each time you try to get a new page.
B: when decryption is done: do as you are already doing.
Please note that this is just a suggestion, I have not tried this while decrypting a document, but if point 1. does not fail, then this should work.