Convert .mht files to pdf files using java - pdf

Is it possible to convert MHTML(.mht) files to pdf with proper css rendering.
Tried using pd4ml but the extenal css and links refered in .mht file fails to get loaded in the pdf genrated.

You could try unpacking the MHTML to HTML and separate files, then using your pd4ml method to generate the PDF.
Chilkasoft Java MHT is one solution you can look into, although after the 30 day trial you will need a license.

Related

font changing when trying to convert PDF to PowerPoint (PPT)

In my project there is a function to produce a PDF file based on a custom template for each customer. in order to do that I use SelectPDF, then add in ASP.NET Razor files and exporting it as a PDF. sometimes I'm asked for specific fonts, so I add them to my .LESS file for the template using #font-face.
now I have a task to create a PDF that may be later converted to a PowerPoint file via Adobe. the problem is that Adobe doesn't know how to handle the custom fonts I use.
for example, this is what I get with the PDF:
and this is what happens after I convert it via Adobe:
is it keep the custom fonts from the PDF export to the PowerPoint file
generally for such a font to be supported it should be 100% embeded as a font not subset as a range of characters.
Then the Full TTF name should be in the fontlist
If the pdf2pptx converter is doing its job correctly, and has access for fallback, it should then use RockwellNova, as seen here without embed and with embed
however since PPTX rarely embed the fonts it will be down to the system substitution when viewing the pptx.

cfdocument not converting Word Document to PDF correctly

cfdocument in ColdFusion 11 is not converting my Word Documents to PDF correctly. I have OpenOffice 4.1.3 installed and configured in CF Admin. I am able to open the source document in OpenOffice and Export to PDF without issue. However, when I run the following code, the resulting PDF is "gobbledigook":
<cfdocument
format="pdf"
srcfile="#_tempSourceFilePath#"
filename="#_destinationFilePath#" />
Here is an excerpt of the resulting PDF (the snip shows developer edition, but, the same thing happens with Standard installation):
I can't figure out why this is happening. Any ideas?
The problem is:
srcfile="#_tempSourceFilePath#"
This is apparently the path to a binary file that is not browser-writable. A necessary condition for the srcfile attribute is that the file be browser-writable. That is, without the need for a browser plugin.

I found a file in raw lyx output, how do I create a readable pdf or txt file from this mess?

I found a file in raw lyx output, how do I create a readable pdf or txt file from this mess?
https://raw.githubusercontent.com/jarcane/bedroom-wall-press/master/hulks-and-horrors/HnHCompanionI.lyx
I have installed LyX and tried pasting, I have tried pasting in OpenOffice and then exporting as plain text, then importing plain text into LyX, it always includes the format coding when I try to export the file as pdf or text.
I just want the human readable portion of the document.
Any help would be appreciated, thank you.
The LyX file you link to is indeed a valid .lyx file. To use it, do the following:
Download the file. The easiest way to do this is to just run
wget "https://raw.githubusercontent.com/jarcane/bedroom-wall-press/master/hulks-and-horrors/HnHCompanionI.lyx"
Open the file in LyX.
Compile to PDF by clicking on the "eyes" icon, or by going to File > Export > PDF (pdflatex) in which case a .pdf file will be created in the same directory as the .lyx file.
Note that you the .lyx file depends on other files. For example, there is an image included in the .lyx file with a path "C:/Users/BearBear/Google Drive/Hulks and Horrors B&W Logo for Print.png".
It is possible that you won't be able to compile the document because of the missing .png or because you do not have a complete TeX installation. In this case, you can simply read the document in LyX. It is not as pretty as in the PDF but it is certainly readable in my opinion.

Grails find/read text from pdf file

we are using grails 2.1.1 and we want to search for contact numbers from a uploaded pdf file. We have already done this with doc files but now we want to search and extract contacts from pdf file as well.
Is there any way to search and extract text from pdf files in grails.
have you looked at apache tika?
it should handle both these formats and save you time handling each type separately

Extract "cover image" from CHM and EPUB files

How can I programmatically and reliably create PNG images from CHM and EPUB files? The page that's needed is only the first one, as in "cover image thumbnail generation".
Could this even be done just from the command line?
I have already looked at the open-source CHM QuickLook plug-in for MacOSX for source that does this and at Calibre, the latter to no avail.
In CHM, the default page is a webpage (a .html file). Of course it can only contain one page.
An extracter program is easy to do based on chmlib or Free Pascal's libs, but it will need the html parsed to also find names of other programs to extract. Roughly the algorithm would be:
use some "list" function of a cmdline extract tool to get the default page's name. (this is stored in an internal record)
extract it, and parse it for img and other referencing tags.
extract those.
The biggest picture downloaded in the previous step is probably "it"!