Headless conver-to PDF soft-hyphen replaced with zero-width whitespace - pdf

i'm working on an webapp creating LibreOffice Documents that i want to convert to PDFs with unoconv and a headless libreoffice.
There is just one problem i can't solve: The soft-hyphens i include in the .odt are replaced with zero-width whitespaces in the resulting PDF. The Problem is not related to unoconv - i tried it directly with a headless libreoffice (same result). i tried both v 4.1.4.2 as well as 4.2.5.2.
i tried another font (Ubuntu) (i use Arial as the body font) as i expected that the missing Arial font on Linux causing the problem (i have the problem on the production server with debian 7 as well on a virtualbox with ubuntu 12.04).
i even installed the arial font in hope it caused the problem due to libreoffice inability to calculate where to set the "real" hyphens without the font file at hand.
strange thing: using LO 4.1.4.2 on my mac (headless of course) produces flawless PDFs. So the problem must be related to either linux or some missing "graphical" package in my server setup. i installed the hyphen-de package which results in hyphens based on the dictionary, but the specified soft-hyphens are still replaced with zero-width whitespaces.
the problem affects both body text as well as text boxes that are used for annotations.
i'd appreciate any hint very very much!

I had a similar problem.
I had to install the right language hyphenation package that fit with the document's language.

Related

How to convert unusual unicode characters (UTF-8) to PDF?

I would like to convert a text file containing Unicode characters in UTF-8 to a PDF file. When I cat the file or look at it with vim, everything is great, but when I open the file with LibreOffice, the formatting is off. I have tried various fonts, none of which have worked. Is there a font file somewhere on my Ubuntu 16.04 system which is used for display in a terminal window? It seems that would be the font to tell LibreOffice to use.
I am not attached to LibreOffice. Any app that will convert the text file into a PDF file is fine. I have tried txt2pdf and pandoc without success.
This is what the file looks like
To be more specific about the problem, below is an example of what the above lines look like in LibreOffice using Liberation Mono font (no mono font does better):
I answered to you by mail, but here is the answer. You are using some very specific characters; the most difficult to find being in the Miscellaneous Symbols unicode block. For instance the SESQUIQUADRATE which sould is on your second line as ⚼.
A quick search lead me to the two following candidates (for monospace fonts):
Everson Mono
GNU Unifont
As you can see, the block is partially covered by PragmataPro which is a very good font; however, I tried with an old version and found all your own characters, but an issue occured because the Sun character (rendered as ☉) seems to be printed twice wider than the other characters, but my version of this font is rather old and perhaps buggy.
Once you have chosen the font suiting your needs, you may be able to render your documents as PDF with various tools. I made all my experiments with txt2pdf which I use daily for many documents.

Powerline Glyphs Overlapping

Shown in the image below, the git prompt has overlapping glyphs.
I installed this theme by following the instructions listed Here. What doesn't make sense is that all prompts other than the git prompt look completely fine. So I guess the question is why would the git extension only be affected by this glyph misalignment?
I've been trying to do my best to research into any similar issues but could not find any thing outside of questions such as this.
The environment that I am using consists of the following
Kubuntu 15.04
Konsole
Tmux
zsh
xterm-256color
oh-my-zsh
powerline status bar
powerlevel9k theme
Font : ubuntu mono derivative powerline
My .zshrc contains these two lines to engage the powerlevel9k theme
ZSH_THEME="powerlevel9k/powerlevel9k"
POWERLEVEL9K_MODE="awesome-fontconfig"
Any insights on tweaking these glyphs will be incredibly helpful. So thanks in advance!
powerlevel9k uses in "awesome-*" mode the awesome-terminal-fonts. Some of the glyphs in there are double-width, which is why we added some extra whitespace to these icons (see here). Most of us use the "awesome-patched" mode, which requires pre-patched fonts, but is easier to install.
A quick shot would be to add some more whitespace. Could you try that, and if that works add a pull request? That would be nice.
Another guess in the wild is: what locale do you use? We had some strange issues with LANG=C. If that is the case on your machine, try setting it to a proper UTF8 one.

wkhtmltopdf and chinese characters

Trying to generate a PDF with wkhtmltopdf but it gives me a lot of trouble displaying all the characters.
Some of characters work - e.g. when printing
"Invoice No (付款编号)" Chinese character no 1, 2 and 4 are correctly printed but character no 3 just displays an empty space in the PDF.
"Customer no (客户编号)" Chinese character no 1 and 4 are correctly displayed but character no 2 and 3 aren't displayed in the PDF.
"Total (总额)" none of the Chinese characters are displayed in the generated PDF.
I'm on a Ubuntu 14.04 desktop system with wkhtmltopdf version "wkhtmltopdf 0.12.1 (with patched qt)". I have installed the Chinese fonts and all the characters are correctly displayed in both gedit and Firefox on my system, but wkhtmltopdf only displayes about 75% of them.
My HTML document is made in with UTF-8 character set and is correctly displayed in Firefox and gedit. I have also tried to embed the font-face directly in the style section of the header using the src: url(data:font/ttf;base64,AAEA....) tag and wkhtmltopdf changes the font face as expected but the missing characters are still missing.
Any help is really really appreciated as I'm getting out of ideas.
Did you install the Chinese, Japanese, and Korean Fonts that are mentioned in the Ubuntu Community Help Wiki?
Looking at the PDF generated on another System in detail, you can find out which font is used by wkhtmltopdf on that system and then locate the proper substitute.
Dalibror Nasevic did the work for a large subset of asian fonts and described what he had to install on a CentOS (RedHat) based system:
Figuring out missing fonts for wkHTMLtoPDF
On a headless Debian-stretch-based system, according to Dalibror Nasevic I had to add
fonts-droid-fallback,
fonts-wqy-microhei and fonts-wqy-zenhei
In addition, following the recommendations from the Ubuntu Community Help Wiki, fonts-dejima-mincho, fonts-nanum-coding, fonts-takao, fonts-takao-gothic, fonts-takao-mincho might be worth giving a try.

docsplit conversion to PDF mangles non-ASCII characters in docx on Linux

My documentation management app involves converting a .docx file containing non-ASCII Unicode characters (Japanese) to PDF with docsplit (via the Ruby gem, if it matters). It works fine on my Mac. On my Ubuntu machine, the resulting PDF has square boxes where the characters should be, whether invoked through Ruby or directly on the command line. The odd thing is, when I open up the .docx file directly in LibreOffice and do a PDF export, it works fine. So it would seem there is some aspect to how docsplit invokes LO that causes the Unicode characters to be handled improperly. I have scoured various parts of the documentation and code for options that I might need to specify, with no luck. Any ideas of why this could be happening?
FWIW, docsplit invokes LO with the following options line in pdf_extractor.rb:
options = "--headless --invisible --norestore --nolockcheck --convert-to pdf --outdir #{escaped_out} #{escaped_doc}"
I notice that the output format can optionally be followed by an output filter a in pdf:output_filter_name--is this something I need to think about using?
I have tracked this down to the --headless option which docsplit passes to LibreOffice. That invokes a non-X version of LO, which apparently does not have the necessary Japanese fonts. Unfortunately, there appears to be no way to pass options to docsplit to tell it to omit the --headless option to LO, so I will end up patching or forking the code somehow.

Chinese characters in IntelliJ IDEA 12 overlapped

I use IntelliJ IDEA to develop my Android project. I've encountered this issue when editing the string XML resource file today. The Chinese characters do show but just overlapped one by one. So basically all you see is a bunch of Chinese characters filled in and overlapped at single character space. Interestingly, when you try to delete those Chinese characters, you just delete the following XML closing tag but not the Chinese character itself...
Have tried copy/paste, same result. I am using the Windows 32bit version.
Can anybody help to fix this issue?
Please check this issue and linked issues for the problem background.
Right now when IDEA doesn't find the glyph to display in the current editor font that you have set in File | Settings | Editor | Colors & Fonts, Font, it starts to search for the first font that has this glyph and finds some font with incorrect metrics that displays overlapping glyphs.
When this request is implemented, you'll be able to specify the order of fall back fonts so that some properly working font is tried first.
At the moment the solution is to change the editor font to the one that has all the required glyphs and proper font metrics (or to find and uninstall the font that is tried first and is displayed incorrectly, note that when running under JDK 1.7 IDEA will also try .otf fonts, not just .ttf, that is why the behavior is different in IDEA 11 defaulting to JDK 1.6 and IDEA 12 that runs under JDK 1.7).