Is there a way to convert submittable/fillable PDF (containing JS) to HTML? - pdf

I have a huge set of PDF files which use forms and JavaScript to submit them. I'm wondering if there's a way of converting such PDFs into HTML (or any other format except Flash which would allow for opening the page in a Web browser and submitting it).
After research on the topic I was able to find several pieces of software that would convert PDF to HTML but even if there are any fields left after conversion there's no JS apparently and all the buttons are missing.
Edit: The number of documents to convert is roughly 500... so "by hand" method is out unless it's a bit of "by hand" and then in bulk

I'm not sure whether this PDF converter will help. It can convert PDF to html, however, I'm not sure whether it will preserve your buttons and js. Maybe you can have a try. Hope this helps.

Related

Get text from a pdf in NSString

I am trying to make an iOS app which would extract plain text from a pdf file and display it in a UITextView. Its simply not a pdf reader to view a pdf file but i would later wish to perform certain operations on that text.
I have already googled a lot but still not able to get an exact solution.
i already tried using https://github.com/zachron/pdfiphone
but the files are using ARMV6 architecture which seems obsolete with xcode 4.5
And if anyone can suggest some exact and non-confusing code using Quartz-2d framework of iOS then it would be great.
Here is An Sample code to Extract text from PDF Hope this Might Help You.
https://github.com/zachron/pdfiphone
This is a library to get the text out of a PDF for the iPhone.
Another Demo is there Which uses OCR technology find the link below
https://github.com/nolanbrown/Tesseract-iPhone-Demo
Also Check this page of the Quartz 2D Programming Guide, it covers everything you need to open and parse a PDF file in iOS. Note that it is not a simple task, since there's no method to extract the full text in one line. You have to work with the data as an input stream, using a CGPDFScanner
Two Other Libraries
https://github.com/KurtCode/PDFKitten/
https://github.com/mobfarm/FastPdfKit
This question comes up all the time. It is VERY hard to extract text from PDF in general. The PDF specification is not designed with text extraction in mind. There are many libraries that try to do the job, essentially by reconstructing the text from the geometric placement of the individual glyphs. These libraries have varying degrees of success, but will all fail on certain PDF documents. In fact, some PDF documents have Glyphs but no way to associate the glyph with a character. For these documents it is simply not possible to extract text, short of using some kind of OCR approach.
PDF is designed as a read-only format that is portable in the sense that a PDF document will be rendered identically on any platform. That is what it is best at, and what it should be used for.
If text is to be edited, do not use PDF.
Here (Extracting text from pdf using objective-c), I found an answer to your question and it works. But not so fine as i need it :(
it can extract only ascii
it return me only one paragraph
Good luck.

How to create and save a .rtf, .doc, .docx in Objective-C for iOS

I am looking to create and save either a rtf, doc or docx file on an iPad (iOS).
The scenario is that we'd like to assist a user in creating content on their iPad and then let them email this as an editable document cross-platform (OS X, WIN).
I am open to other solutions besides the rtf, doc or docx file format.
Thanks,
James
RTF is going to be the easiest, because it's a plain text format. It's kind of like HTML, but without closing tags. Here is a class for writing an RTF, but it requires a lot of dependencies from elsewhere in the framework.
DOCX would be rather difficult. It's actually a zip file, containing a few XML files. You can examine the format yourself by changing the .docx extension to .zip and unzipping it. But even though XML is a fairly easy to write format, the way the text attributes are organized is still rather complicated. Also, I recall that it has to be zipped in a very specific way to be read properly.
As for DOC, it will be very difficult because it's such a complex format. You could look into some open source projects, like Abiword or Word2x. Be careful using their code because the licenses may not agree with the App Store rules.
I've seen doc & docx readers for iPhone (App store entry linked here), but I don't know of any open source frameworks you can make use of.
RTF format should be pretty simple to write, if you're up to the challenge. There is no built in framework support for it (here's a related question, b.t.w.).
Maybe you could write out something in a regular TEXT format and e-mail that?
Docmosis has a cloud service that you can reach from iOS. You can ask it to render a doc in various formats (doc, rtf, pdf, odt etc) and email it off or stream it back - though you have to be connected. Previewing DOC on iOS is possible but a little flaky. One option is to stream PDF back for display on iOS and email editable document (which can be done in one call).

How to convert PDF file to .doc format in Objective-C?

right now i am working on one ipad application where i am giving facility of opening the pdf file and also to customize it,now i want to add one functionality like i want to convert that pdf file in .doc format.
I researched but did not get any way around. Can anybody help me out?
Thanking you in advance.
I wrote an article on PDF to text conversion issues. If you look at some of the existing PDF to Word conversion tools (ie BCL) you will see what is realistically possible with a lot of work.
It’s not possible to convert a generic PDF back into a text format. I guess you could render the PDF into images and create a DOC from those, but that doesn’t sound very useful.

web based form to collect data and populate to a fillable PDF

Is there a script that anyone can suggest that would allow me to create a HTML or PHP web based form to collect data and save that data. the call the data to be populated in a fillable pdf?
If you have an existing PDF that you want to populate, and that PDF just has text fields (no checkboxes or radio buttons) then CAM::PDF may be able to help you. You can use it as a Perl library directly, or use its command-line interface. CAM::PDF is not useful for generating PDFs from scratch, however. Furthermore, if you have embedded fonts, then you need to ensure that all of the characters you plan to insert are represented in the embedded font.
Use a normal web page to get the data. If not sure how to do it, look for "php forms" on google, there are plenty of tutorials.
Then use a php pdf generator, like this one, to create the PDF file. If you look hard enough, you will probably find a pdf generator that will let you use a template with placeholders where the entered data should be.

How can I embed a PDF in an email?

I've already referred to this SO post. I've been embedding images using an AlternateView for PNG files. Now I'm wondering how to do it with PDFs.
Should it work, for the LinkedResource, to just say:
Dim document As New LinkedResource(pdfFilePath, "image/pdf")
I'm just trying to figure out how to get the PDF to be embedded like I could with an image, or is that not possible and I'll have to do it as an attachment?
You can embed images since they can be rendered in place by an email client. PDFs cannot do that, so I'd recommend either having a thumbnail of the PDF that links to your web site with the actual PDF. Or just attach the PDF to the email message.
There are a few options that I know of.
1) Is the simplest way okay? The easiest by far would be to attach the PDF as a normal attachment. Then render the first page of the pdf as an image, embed it in the email and link it to open the PDF if you can. Entourage kind of does this on the Mac.
Alternatively, what I found was the following:
2) FLASHPAPER embedded in HTML displaying a PDF. Adobe has a technology called Flashpaper. It is a flash based file viewer. You can use flashpaper format documents that go into it, or PDFs as the source.
Check out some examples. That's really flash. http://www.adobe.com/products/flashpaper/examples/
Assuming you send an HTML email that will get through (images aren't turned off, etc), you can can embed the Flashpaper viewer right in your HTML code as a normal Flash object.
Most HTML email clients use Internet Explorer Bits, Webkit bits, or Gecko bits to render the html. Flash player is pretty well installed on everything, so it works well. A good example of this is when we open an email and it has video playing in it. It's almost always Flash.
I have had luck doing it this way -- the only thing you'd have to decide is if most of your clients can see this and how much (if any) today's software might block it.
What I ended up doing was a hybrid. 1) Attach it to the email, 2) Embed the Flashpaper viewer. They get it either way.
Flashpaper is available seperately for $75. It has come in handy where the client was not able to install adobe acrobat on each computer and it had to be 100% web based.
I would imagine you should be able to do the same using any language with a little more effort and using something like Flashpaper.
Hope that helps
This is not possible--at least not in a way that will work with many clients. You'll need to just attach the file.
If you have only one client to worry about, it might be possible--but not likely without manually changing settings on each client.
The MIME type of a PDF is "application/pdf" not "image/pdf"