here is the pdf file:
here is a markdown file
I used this website to convert
Can I do anything to have proper chinese symbols in PDF?
Related
I am using Aspose.pdf for vb.net to convert my data to .pdf file.
File contains Arabic Characters and is using Cairo font.
The generated pdf file not display any Arabic words!
How can I solve this issue?
Today i tried to search a Arabic word in a PDF file that contained Arabic content.
All PDF reader soft wares cannot search any Arabic word in this PDF file.
So I dragged PDF file into Firefox browser and selected a area that contained some words by inspect elements and saw this:
hw ½oiC instead of آخرین سخن
What is type of the encoding used in this PDF file?
how can i encode this to normal text?
It's difficult to comment on the file you are looking at without seeing it but a good starting point is to try Acrobat and by either copying the text and pasting it into a text editor or doing a search for the text content will reveal if it can be extracted correctly or not.
If it can't be extracted properly then there's a good chance the font is lacking a ToUnicode entry (see Section 9.10.1 of the ISO PDF 32000-1:2008 specification for more information).
While using PD4ML for pdf creation, I have noticed that there is a problem with HTMLs that contain Chinese characters.
Those characters are Chinese and can be seen in their normal form in the source html.
snippet from pdf:
When converting a .docx file to a PDF/A-1a file with LibreOffice, the file created is not compliant with the PDF/A-1a standard.
When I try to validate the file using Preflight in Adobe Acrobat the following error shows ups:
Text cannot be mapped to unicode (154 matches on 2 pages)
And when I copy text from the PDF in Preview.app all accented characters are missing or mess up.
From my research I understand that LibreOffice is not building the /ToUnicode mapping correctly for accented characters because those characters are built for more then one glyph and LibreOffice is just dealing with the first glyph. Ref: Can't copy text from PDF exported from OOo
Is there's a workaround? How can I convert .docx to valid PDF/A programmatically on Linux?
For info here's the command I use to convert the file:
unoconv -f pdf -eSelectPdfVersion=1 source-file.docx
This other command does not give a PDF/A compliant file as expected but it have the same Unicode mapping problem:
libreoffice --headless --convert-to pdf source-file.docx
The problem is present with LibreOffice 5.2.3.3 that I was using. The problem is not present with LibreOffice 5.1.4.2 and 5.1.6.2.
So downgrading to 5.1.6.2 fix my problem.
I added more information to an existing bug report.
I am generating PDF files which contain English and Chinese characters (using the Ruby Prawn library). I don't want to embed a Chinese font file in the generated PDF files, because these files need to stay small. So I'm wondering if I could just mentioning a Chinese font name in my PDF files, and have the PDF readers correctly rendering the Chinese characters because the PDF readers would already have the Chinese font file.
Is that something sensible? If so is there any commonly used Chinese font that one can expect to be installed in most of the PDF readers used by Chinese people?
The best way to ensure that a PDF file can be displayed on a any reader, is to use partially embedded fonts (also known as font subset). In PDF, you don't need to include the whole font with your document, having a subset with just the glyphs that were used in the file is enough for the file to be portable.