What to use instead of QuickLook.framework to handle HTML? - objective-c

I was using
QuickLook.framework
to show a pdf file in the most simple way but now I need to display an HTML document instead but what it displays is plain unreadable text that starts with
%PDF-1.3 %Äåòåë§ó ÐÄÆ 4 0 obj << /Length 5 0 R /Filter /FlateDecode >> stream
So QuickLook obviously isn't a good for displaying HTML.
What can I use instead that works similarly?
Or can I adapt QuickLook to use it for HTML?

On the iOS, the UIWebView can display PDF files without problems. I have done this on numerous occasions and it is very simple, you get a lot for free (scrolling, zooming, etc).
QuickLook can not display HTML content.

Related

Is there any pdf analyser / debugger to debug a pdf file?

Is there any validator or PDF analyser which can tell me what is wrong with a PDF I created with a hint or indicator which object in my PDF is wrong or something like that?
I would like to create and understand the PDF file format better and I think I should be pretty close to a working PDF but I can not find the problem in it and why PDF readers are not able to read it.
Isn't there a program or an online service which can give me at least a hint what is wrong with my pdf structure or where the problem occurs or even tell me what is wrong? How to debug something like that?
Here is a link to the PDF (just the attached image converted to a PDF):
https://nonepatchwork.patchwork3d.de/create_pdf/created_pdf.pdf
Best regards and thank you very much in advance
Fuchur
The request for a validator or PDF analyser is not on topic on stack overflow, it's better suited for the Software Recommendations site. In this answer, therefore, I'll focus on analyzing the provided example files.
created_pdf.pdf
Here a number of issue immediately leap to the eye, in particular:
Your page object 5 points to object 6 as content stream, but object 6 is not a content stream but an image xobject! (You probably meant to point at object 7.)
All your cross reference table offsets are wrong.
The Size entry in the trailer is wrong.
There is an /ID between trailer dictionary and startxref.
There probably are more issues, but start by fixing these.
created_pdf_2.pdf
Here you fixed the errors listed above but the file still does not display as expected, Adobe Acrobat Reader in particular says:
Looking at the image dictionary the cause becomes clear:
6 0 obj <<
/Type /XObject
/Subtype /Image
/Width 595.276
/Height 841.89
/ColorSpace /DeviceRGB
/BitsPerComponent 8
/Filter/DCTDecode
/Interpolate true
/Length 707
>>
...
The values of Width and Height are floats which is invalid. Furthermore, inspecting the actual image data in the stream it becomes clear that the values are completely incorrect, the image only is 20×20 in size.
Thus, replace the Width and Height entries by
/Width 20
/Height 20

Cannot select PDF from top to bottom

I'm using pdftotext to extract info from a pdf. Currently using the -raw option. I do have a few problems with the PDFs I'm working with. If I select the text from top to bottom it selects in the following fashion.
PDF content:
A
B
C
It selects A then C and then B. So when I extract the text it is presented in the same way. Is there a way to reformat the PDF so I can select the content from top to bottom?
NOTE: I'm aware that if I omit the "raw" option the layout will be preserved, but it seems to be buggy when the document includes tables so raw works better for me.
Yes, you can reformat the PDF so that the content is returned from top to bottom. This is not something that can be easily done using Adobe Acrobat or any other viewer that I am aware of and here is why.
From the documentation of pdftotext, the -raw option is defined as
Keep the text in content stream order. This is a hack which often "undoes" column formatting, etc. Use of raw mode is no longer recommended.
"content stream order" is the important piece in the description.
In PDFs, the content on the page does not have to be written in the content stream (the instructions that are interpreted to display the page) in the order that a human would read the content when the page is rendered. The internals of PDFs do not care about the ordering, they were designed to reproduce the same visualization of a document on a variety of platforms. Since all that matters to PDF is the visualization, applications or libraries that write PDF tend to not order the content stream in any meaningful way.
So you can reorder the instructions in a content stream so that they are in the order a human would read them, it is not an easy task to do by hand and using a library that understands PDF to manipulate the content stream would be one way of doing this. Another way is to look for a more advanced tool to use to extract text from the PDF (there are a number of tools that will look at the placement of the content on a page rather than just where it appears in the content stream).
I am not aware of anything that will reorder the content stream in the PDF based on where the content appears on the page automatically though.

[Steganography ]Hiding Data in PDF files

I'm trying to hide a file in a PDF file code. I've already search some information to help me. I've tried to uncompress the pdf using pdftk ( pdftk pdf.pdf output uncompress.pdf uncompress ). Then I tried different things such as :
Insert commentary : I put " %TEXT_TO_HIDE " in the uncompress pdf file code.
add new object : I put " 0 0 obj << TEXT_TO_HIDE << endobj " in the uncompress pdf file code.
modify an existing object
then i compress it using pdftk again
In each case, I obtain a new pdf, which is looking different from the original. It's not corrupted but images have different colors, and some original text are missing.
So, do you know some rules to change a pdf code without anyone notice ?
(PS : Sorry if my english is bad ^^ )
You cannot modify a PDF file in a text editor and expect the file to be still compliant in general. PDF is a binary format and you need to read the PDF specification to figure out how to modify it.
That said, there are heaps of places where you can "hide" information in a PDF document, the real question is how much data you want to hide, and to what purpose. The purpose typically links to how secure exactly this needs to be.
As some examples:
1) PDF allows embedding complete files in the actual PDF file. This is not really secure as anyone with decent software can extract these files (but the file itself could still be secured of course).
2) PDF allows adding arbitrary objects anywhere (or almost anywhere) in the file. This is a great way to hide information, but someone with the right tools can browse the object tree (even if the file is compressed) and see what you did.
3) PDF allows adding for example white text on a white background or text behind other objects. Again, there are ways around this for people with the right software.
4) Adobe's PDF spec allows at least 1K of fluff after the %%EOF marker (although ISO 32000 does not). Keep in mind that this is visible to anyone opening the file with a decent text or binary editor. (Thanks Jongware).
In short, you need to define much better what exactly you want to accomplish and how "secure" secure is in your use case.
You should also consider how "robust" the method must be. Should someone be able to save your PDF file with Acrobat for example with the hidden code intact? Some of the above methods may not be robust enough to ensure that with absolute certainty.

PDF special searching iOS

I know that there's a great source that works on iOS for PDF searching, it's PDFKitten
But my case is that I encounter some PDF files that this source don't work for search. I tried to open these file by 'Preview' app on Mac and tried to search, it works.
I uploaded one file here.
You can check by open this file by 'Preview' app and search the word 'ra'. It works perfect. By if you drag this file to the source PDFKitten and make some configurations so that the source open it, then try to search, it don't work.
I inspected the source, it cares all the text showing operator, including Tj, ', '', TJ. I placed some log lines in these operator's call backs and I saw these call backs are not called.
Can you give my some suggestions or any ideas?
If I understand the code correctly, PDFKitten looks for fonts only in the /Font entry of the /Resources dictionary of the page. At least that's my interpretation of the method fontCollectionWithPage of Scanner the result of which is queried by setFont in pdfScannerCallbacks to set the current font object.
Furthermore there is no callback for the Do operator (i.e. the operator used to inject the contents of a XObject resource into the page content). Unless CGPDFScannerScan interprets this operator under the hood, the content of included XObjects is not scanned at all. This would match your observation that the text setting operator callbacks never get called.
Your file mundo1.pdf, though, does not have any immediate /Font entry in the /Resources dictionaries of its pages. Instead all the actual content of each page is wrapped into a single /XObject resources respectively. These XObjects in turn have their own /Resources dictionaries which contain a /Font entry defining the fonts used for the respective page.
Thus, PDFKitten does not know anything about the fonts used in your file, especially about their encodings, and so cannot extract the text from the PDF contents. Maybe it does not even get to see the PDF contents to interpret.
I would, therefore, propose you post this issue on the PDFKitten issue management site.
By the way, this PDF construct is completely according to the PDF spec. Nonetheless it looks like a non-adequate use of the iText library. The author of the software using iText like that should review his code and start using better suited classes of the iText library.

Append text to PDF in Coldfusion 8

I have a PDF that I want to append some text to. the addFooter() that is available in CF9 would work perfectly, but I only have access to CF8.
Any one have workarounds for this feature in 8?
Thanks
Yes, even in ColdFusion 8 you can use DDX to add footers and headers to a PDF. See the specific Adobe 8 Livedocs on how to do this. I also have a couple blog posts 1 and 2 that might help. ALthough I tested on CF9, there's CF8 valid information as well. You might also want to get the almost impossible to find DDX reference. Also check out ColdFusion Jedi's 8 part series on PDF manipulation in CF8.
UPDATE (Added information below on combining text):
To take PDF1 and PDF2 and put the text on a single page in resulting PDF, the first thing that comes to mind is that you could use cfpdf with the getinfo action to get the text (if you don't already have it in a plain text or HTML format). Then you could cfoutput the text into a cfdocument element of type pdf. That way you get a new merged PDF with the contents combined.