Downloaded octet-stream then encoding as pdf; can't get line endings worked out - pdf

Tools that I'm using for this:
Chrome
Notepad++
Sublime Text 3
Fiddler
WinMerge
Adobe Acrobat Reader X
Synopsis
I have downloaded a pdf twice, once through Chrome as an experimental control; once again through a raw /GET request via Fiddler which returns me an octet-stream. To this point, I can save the octet-stream as pdf and I can get the proper page count and some of the page headers and numbers, but very little of the body content is loading. When I open my file in Adobe Reader X, I get an error that it
Cannot extract the embedded font 'LFIDTH+ArialMT'. Some characters may not display or print correctly
and I cannot work through why it can be extracted from the 'true' pdf but cannot from the one I am saving.
Details
As for my manual pull of the file, I have provided
Accept: application/pdf, application/x-pdf, application/x-gzpdf, application/x-bzpdf
The server sent me back an aplication/octet-stream with an attachment Disposition.
So to recap:
Valid Foo.pdf sitting on my hard drive
HTTP Response with an octet-stream version of same file, in UTF-8 encoding (I assume)
Here is what I know:
I pulled the Message Body of the response from the server and dropped it to file. I then ran a WinMerge comparison of it against the contents of the pdf and every line mismatched on line endings. I re-encoded the EOLs for Unix and the diff shrank to ~1k lines out of 160k. A close inspection of the mismatch indicates that the valid pdf maintains what looks like a NUL 00 character in places whereas my octet-stream contains literal spaces. Also, the "true" pdf is reporting EOL: LF 1252 Mixed through WinMerge. My "raw" pdf is reporting 1252 Unix When I homogenize the 'true' pdf to 1252 Unix, I get the same issue as I explained in the 'raw' one.
Is there anything I can do to get this mess of an octet-stream straightened out?
Note that the pdf that was downloaded through Chrome is historic. I have it on my machine, but I downloaded it "sometime in the past" and the request headers used when processing that /GET are no longer available. Attempting to download through the browser "now" results in an error, but an explicit GET request against the resource through Fiddler is returning the pdf as an octet-stream.

Well now....
In Fiddler Session,
Right click HTTP Response with the application/octet-stream body | Save | Response | Response Body
If Content-Disposition: attachment;filename has been set on the response, the File Save Dialog will be prepopulated with filename
Easy after you know it's there.

Related

TJ and Tj operators showing garbage values after decoding

I have used zlib python library to decode stream which were compressed using FlateDecode. Until now, all the pdf files I have worked with, showed correct values in Tj and TJ operators but I am facing issue decoding this pdf as I am not getting what's displayed in the PDF.
I am able to copy text from the PDF to notepad without any issue and also pdftotext is giving expected results with correct words as output.
I have also used Adobe Preflight to see the document's internal structure to double check the decoded text I am getting via zlib but even that shows garbage values and it doesn't match to what's displayed in the PDF.
Why do I get this garbage value in text operators and how is pdftotext still able to get the correct results ?
Also, How do I get correct results via python/zlib ?
PDF File
The values in the TJ/Tj operators are PDF codepoints (normally one byte, sometimes two). You will need to see which font is in operation, then read the font encoding (there are many kinds). PDF text extraction is very hard. I wouldn't advise trying it yourself.
You have been lulled into a false sense of security by seeing PDF files in which the PDF codepoints happen to be exactly the same as the unicode codepoints they represent - i.e you have been looking at files which use simple font encodings.

Is there a way to set the text format when sending content via the email alert node

When sending the contents of a text file via the Email Alert it converts the text to UCS-2 Little Endian from UTF-8. Is there a way to force the text format so as to make sure the package is the same as the file generated?
I have tried generating the file in Binary and turning off HTML Body. Do I need to use a formatter maybe?

PDF contains encoded content

Through a web service I get a PDF response. When I hit the endpoint through my browser the file is downloaded and saved as I would expect.
The problem: when I open the downloaded PDF the content appears to be encoded. If I paste the PDF content into something like MS Word or even Chrome, it becomes readable english.
I have opened the PDF using my code editor to inspect the metadata, but I don't know what exactly I'm looking for.
How can I get the content to display as readable english when the PDF is opened?
Any help would be much, much appreciated!
Here is a link to the PDF in question: testPDF

CFPDF add watermark any level of transparency is coming out opaque

When I watermark a pdf onto another pdf any semi-transparency in the watermark PDF is made completely opaque. Is there anything I can do about this or is this a limitation of CFPDF?
Server is CF9 with latest hotfixes.
Fun bit, when chrome renders the final product the transparency is preserved, but when Acrobat Pro renders it it's opaque. I can print the final product to AdobePDF and it's accurately transparent, but I don't get a consistent page size to send through our print shop which is a showstopper issue.
Code added per request:
<cfpdf action="addwatermark"
source="#BackgroundPDF#"
copyfrom="#ForegroundPDF#"
destination="#DestinationPDF#"
foreground="yes"
opacity="10"
overwrite="yes"
position="#XYPositioning#"
rotation="#RotationIfRequired#"
showonprint="yes"
>
Additional detail I've discovered as I go along: if I get on Acrobat pro, I can go to print production, and output preview and change the "show" option to "Not DeviceCMYK" and I get my transparency back, but this is just some kind of preview, how do I actually remove that colorspace from the PDF?
Thanks to the help provided by #mkl Here we were able to figure out how to monkey patch the pdf binary. So then I just need to be able to do so in CF. Since reading the file in as a text file causes problems due to character encoding I was able to do this.
Identify the text to change in the binary. This is what #mkl
helped my with. The problem text is "/K true" which is telling the
PDF to use knockout groups which I'm sure makes sense to PDF experts
but is total Greek to me.
Read the pdf into coldfusion as a binary <cffile action="readbinary" file="#inputPath#" variable="input">
Encode the binary bytearray to Hex <cfset temp=BinaryEncode(input,"Hex")>
Remove the, now hex, string I want removed <cfset temp2 = ReplaceNoCase(temp,"2F4B2074727565","","All")><!--- 2F4B2074727565 is HEX for /K true --->
Decode the Hex back into a bytearray <cfset output = BinaryDecode(temp2,"Hex")>
Write the output file to the file system <cffile action="write" file="#outputPath#" output="#output#" nameconflict="overwrite">
Now you have a PDF that looks like expected. The problem is that there is something wrong with it. If I open it, do nothing, and close it I'm prompted to save. If I save it, I no longer have an issue. I figured that a CFPDF merge operation would do basically this without requiring a user to do something so I added this final step.
Resave the pdf with the merge command <cfpdf action="merge" source="#outputPath#" destination="outputPath2" pages="1" overwrite="yes">

Save Livecycle PDF file before submitting to server

I have created a LiveCycle PDF form that includes a Submit button to send it as XDP (including the base64 encoded PDF) to a server that pulls out the XML data and saves that to a database and then pulls out the encoded stream, decodes it and saves that back as a PDF on the server.
The issue that I am having is that once I open the PDFs made from the base64 encoded data, it seems that they are empty. After some testing I found that if I manually save the PDF before Submitting it, the information that was entered up to when it was saved is included in the encoded PDF (whereas the full data is included in the XML portion).
So my question is there a way to either:
Automatically save the PDF or otherwise preserve the data so it is sent in the base64 encoded portion of the XDP? (preferable)
Recognize when a change in the document has changed and request that the user save the PDF before clicking submit?
It seems the issue I described above was actually due to using Foxit reader instead of Adobe reader.
Adobe reader of course requires the Reader Extensions in order to be able to save form data and submit it.
Foxit does not have that restriction but does not embed the updated version of the PDF in the XDP XML data sent to the server. The only way to perform this would be ensure the user saves the PDF first which removes the Reader Extensions as per Adobe's licensing requirements.