Can someone advise how to remove the printer marks in the pdf file generated by Bookdown?
This is really a LaTex question. You can try using a different document class, as specified here, though:
https://bookdown.org/yihui/bookdown/latexpdf.html
Related
I am looking for the method (of Word ole-object) which can open pdf in the Microsoft Word.
I want to copy all pages of pdf into doc/docx and add there footers.
Could anybody give the cue how to import pdf?
PS: any sample code for this problem would be great.
Thanks,
Lilya
You need OCR (Optical Character Recognition) engine for converting PDF to document. PDF is generic format and it can include text as image. So it is very hard to convert PDF to document. SAP hasn't got any OCR function for doing this. Maybe OpenText (if customer using it) has this functionality, I haven't got detail information about opentext. You need third party tools for this. You can use online services or command line utilities to converting PDF files to text files easelly if PDF included text, otherwise you need professional SDKs (for example Abbyy Finereader) for doing this.
I used FoxIT PDF Reader to save the PDF file into text file and make a macro to read the text file. Of course, by doing so, you can only get the text, but nothing else.
I am trying to copy some text from a PDF. But When I paste it in a word file, it is just some garbage. Something like മുഖവുര. The PDF is in Malayalam language. When I see File->Properties->Fonts, It says BRHMalayalam (Embedded Subset) as shown in the screenshot.
I installed various Malayalam fonts but still no luck. Can anyone please guide me?
The PDF I am trying to copy from is https://drive.google.com/open?id=0B3QCwY9Vanoza0tBdFJjd295WEE&authuser=0
Installing fonts won't help, since they are embedded in the document. The reader will use the ones in the document.
In fact it almost certainly must use the ones on the document, because it will probably have used character codes specific to each font subset.
Your PDF probably has character codes which are not Unicode values, and does not contain ToUnicode CMaps for the fonts in question (note the same font name embedded multiple times). There is no realistic way to copy the text.
The best you can do is OCR it.
After looking at the file, and confirming the answer already given by #KenS, the problem with this PDF document is in fact how it's constructed. Or rather how the font in the document has been embedded.
The document contains a number of Times and Arial fonts, for which the text can be copied successfully. Those fonts are embedded as a subset with a WinAnsi encoding. What is actually in the file is close enough to that, that the text seems to copy out well.
The problem font (BRHMalayalam) is also embedded as a subset, and its encoding is also set as WinAnsiEncoding, which completely doesn't make sense.
And because the font doesn't contain a ToUnicode mapping table, a PDF viewer has no other choice when copying and pasting to assume the characters in the PDF are indeed Win Ansi encoding which means you end up with (garbled) latin characters.
Just convert the pdf file to word file and then edit or copy or modify the text present in the file simple :)
and after completion go to file -> save as -> and change the format of doc to pdf ..hope u understood :)
I know people have asked similar question but couldn't find an answer to this. I have a pdf file that was produced using pdflatex. It is searchable (you can press ctrl+f and search for words inside) and it uses hyperref for the citations. I want to make a ps file out of it.
I tried pdf2ps from gs and pdftops from the poppler package. Both make the document like it is a picture. You cannot search anything inside and also the hyperrefs don't work.
Any way I can make a ps file but at least keep it searchable?
Thanks in advance!
Why do you want to 'search inside' a PostScript file ? A PostScript file is for printing.
What do you mean by 'hyperrefs don't work', what do you expect them to do ?
Consider this tool: renderpdf
I don't know about citations, but it makes searchable postscript files
I'm converting WORD docs to PDF programmatically using vb.net and ghostscript. This word doc I’m having problems with has hyperlinks to external URLs and also hyperlinks to bookmarks within the document. When the doc is converted to PDF the external URLs work but the links to the bookmarks do not.
I have searched for a solution to get these bookmarks to work on the output PDF but haven’t had any luck. Hopefully someone has done this and can share the solution.
Ghostscript only handles PDF or PostScript as an input, there are sibling products to handle XPS and PCL as well but none of them handle Word .doc files. So you must be converting the Word file into something else.
I'll hazard a guess that you are using the Windows PostScript printer driver to convert to PostScript and passing that to GS (possibly via the RedMon Port Monitor) to convert into PDF.
Now PostScript doesn't support hyperlinks, bookmarks, or any of the other paraphernalia of a viewing application, since its intended as a print language. To overcome this Adobe introduced an extension, the pdfmark operator, which can be used to create this kind of information. NOTE this is an extension which is only supported for conversion to PDF.
So, in order to get these inserted, you need to create pdfmarks in the PostScript. If you are printing from Word, this means that you have to insert PostScript into the file when printing. There is a 'pass through' mechanism for this purpose.
So what you need to do is create the appropriate Visual Basic script in Word which inserts the relevant pdfmarks when the document is printed. This is how the Adobe plug-in for Word (which used to be called PDFMaker a long time ago) works.
Have a look at this tool.
It does maintain bookmarks and hyperlinks.
http://www.transcom.de/transcom/en/2004_pdf-t-maker.htm
I'm working on an ebook conversion script.
When I create a PDF file using emacs ps-print-buffer-with-faces and then ps2pdf, I can select the words one by one on my ebook (Sony PRS 600). Same when I use Microsoft Word to print to PDF.
Yet, when I create a PDF using pdflatex, or latex -> dvips -> ps2pdf, I can't select but blocks of words, separated by punctuation signs.
It seems that there is something in the structure of the PDF files generated by latex that ebooks don't understand -- but what?
Do you know a switch to tell latex to behave properly, or a workaround?
Thanks!
CFP.
Latex doesn't use a white space character to separate words. That's the problem.