I'm trying to find a way to convert docx to image-only pdf, so I could put a watermark on the pdf document right after conversion.
I've looked through convertapi documentation and I can't find any available options.
First convert the .docx file to a .jpg:
https://www.convertapi.com/docx-to-jpg
then feed the .jpg to
https://www.convertapi.com/jpg-to-pdf
to generate .pdf
You can chain the API calls to get your desired output.
Related
Is it possible to use Meilisearch to search contents of PDF and DOCX files? If yes What is the process of indexing and searching?
It's currently not possible to index PDF or DOCS files with MeiliSearch, you have to extract the text from your file and push the content into MeiliSearch. The current content types accepted are JSON, CSV, and NDJSON.
Here you can find a discussion where a user explains his approach: https://github.com/meilisearch/product/discussions/164
I need to convert text in pdf file to images, so users cannot copy it from the pdf etc.
This should be equivalent to converting the entire pdf to a set of images and then merging them to one single document. I did so, but it seems slow, is there any way to do it with ghostscipt options?
Welp, looks like I only need to specify option -dNoOutputFonts.
I am looking for the method (of Word ole-object) which can open pdf in the Microsoft Word.
I want to copy all pages of pdf into doc/docx and add there footers.
Could anybody give the cue how to import pdf?
PS: any sample code for this problem would be great.
Thanks,
Lilya
You need OCR (Optical Character Recognition) engine for converting PDF to document. PDF is generic format and it can include text as image. So it is very hard to convert PDF to document. SAP hasn't got any OCR function for doing this. Maybe OpenText (if customer using it) has this functionality, I haven't got detail information about opentext. You need third party tools for this. You can use online services or command line utilities to converting PDF files to text files easelly if PDF included text, otherwise you need professional SDKs (for example Abbyy Finereader) for doing this.
I used FoxIT PDF Reader to save the PDF file into text file and make a macro to read the text file. Of course, by doing so, you can only get the text, but nothing else.
I am converting Marathi data from PDF to excel or word but it is not getting proper format.
I have copied some data from PDF and pasted in word document but it was not getting proper format.
e.g. प्रविण सुधाकर शिरवाडकर this line is in PDF
but when i copied and pasted in word it has been getting
-प्रववर् सुधाकर शिरवाडकर
what should i do for this?
anyone please help me.
thank you in advance
There seem to be problems in the way PDF stores unicode devnagri text. Try this alternative route: convert your PDF to an image. Can use an online tool or downloaded, or if on linux use this command in terminal:
for f in *.pdf; do convert -density 200 "$f" "${f}_200dpi.jpg"; done
change the density from 200 to other as per need. Each page from your document should be converted into an image file. For a windows tool, try https://www.pdfill.com/pdf_tools_free.html
Then, go to http://www.i2ocr.com/free-online-hindi-ocr, upload the image and convert. That uses OCR (optical character recognition).
check the font in your PDF and try making it available to the word document.
I think you dont have perticular fonts which are used in PDF
In Adobe Reader -- -- File menu > Properties > Fonts tab gives you a list of all fonts used in the document.
I have a few books that I absolutely MUST be reading; they are a set of calculus textbooks as PDF files. The problem is that the graphs and images in these pdf file are all png, which is apparently not supported by my kindle. Is there anyway I can convert these images as a batch into jpeg or any other format inside the pdf file. I have tried everything from converting the pdf to other formats (equation formatting didn't let it work), to extracting the images from the pdf file and getting them converted. I just really need to know if there is any program I can use to help me or if maybe, there is a way I could 'open' the pdf container, and switch out the png images for the jpeg images and replace the png file extensions with jpg. Any help would be greatly appreciated.
The books are:
http://tutorial.math.lamar.edu/pdf/CalcI/CalcI_Complete.pdf
http://tutorial.math.lamar.edu/pdf/CalcII/CalcII_Complete.pdf