hreflang for China Taiwan zh-Hans zh-Hant or zh-cn zh-tw - seo

Hello thanks for any input, ok the Hreflang, should I use the zh-Hans zh-Hant or zh-cn zh-tw
as seen below, cheers
简体中文 (Chinese Simplified)
繁體中文 (Chinese Traditional)
or
简体中文 (Chinese Simplified)
繁體中文 (Chinese Traditional)

Related

Interpretation of CID Characters in text extracted from PDF

I use pdfminer in Python to extract text form PDF documents. Some special characters are now represented as (cid:xxx). Here two examples of a line extracted from a german text on physics:
(cid:129) der Fragestellungen
or
um a(cid:4) Atomradius(cid:4) 0;05 nm kommt die Formel der Realität sehr nahe. Resultat
Is there any way to figure out, what these codes stand for? In the ideal case, they should be replaced by a unicode character.

How to Display emojis in PDFs generated by FOP Apache

I am looking for a way to display as many characters (including emojis) as possible in PDFs generated by FOP Apache. The FOP people advise to use a font containing the emojis, but I tried NotoColorEmoji.ttf from Google and got an exception. I have also tried Symbola etc. but all fonts seem to be old and emojis separated by a a ZWJ (zero width joiner) don't work. I also tried Curier New, since it displays emojis correctly in Windows 10, but the ttf, does not contain emojis. The characters I need to display are as follows:
ÄÖÜ
个相同基因的更多拷⻉来提⾼适应性
Πρωτότυπο κβαντικό ραντάρ από ερευνητές στην Αυστρία
☹️😀😃😄😁😆😅😂🤣☺️😊😇🙂🙃😉😌😌😍😘😗
👩‍⚕️
The Exception when using NotoColorEmoji.ttf is as follows:
2021-01-27 15:19:18,104 ERROR Failed to read font file file:///C:/Roboto/Noto/NotoColorEmoji.ttf 'loca' table not found, happens when the font file doesn't contain TrueType outlines (trying to read an OpenType CFF font maybe?): java.io.IOException: 'loca' table not found, happens when the font file doesn't contain TrueType outlines (trying to read an OpenType CFF font maybe?)
You'll likely need a few fonts to capture all of those characters.
You could try including some of these in your list of font families (your ordering may need some tweaking depending on your preferences of what is used first if it has the glyphs). My "go to" list is for both Mac and Windows, and captures more than what is in your sample, but I think it should handle it.
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format"
xmlns:fox="http://xmlgraphics.apache.org/fop/extensions"
font-family="Segoe UI, Helvetica Neue, Helvetica, Segoe UI Emoji, Symbola, Arial, Ping Fang HK, Ping Fang SC, Ping Fang TC, Heiti SC, Heiti TC, Beijing, Taipei, Malgun Gothic, Batang, Gungsuh, Microsoft JhenHei, Microsoft YaHei, Aparajita, Kokila, Mangal, Nirmala UI, Sanskrit Text, Utsaah, Tahoma, Arial Unicode MS, Apple SD Gothic Neo, AppleGothic, sans-serif">

Characters not displayed when viewing PDF in Gmail

I am having issues with previewing PDF in Gmail. It doesn't recognize some of the international characters that I am using (it doesn't show letters like ą ć ś, but it shows for example ł). I am encoding the pdf with Cp1250.
Any ideas on whats going on?
It looks like you are using the Standard 14 Fonts and don't embed them into your PDF. PDF readers are required to bring along these fonts but only with a limited character set which does not include ą, ć, or ś but which does include ł which matches your observation
(it doesn't show letters like ą ć ś, but it shows for example ł)
For details on these fonts confer the PDF specification
9.6.2.2 Standard Type 1 Fonts (Standard 14 Fonts)
The PostScript names of 14 Type 1 fonts, known as the standard 14 fonts, are as follows: Times-Roman, Helvetica, Courier, Symbol, Times-Bold, Helvetica-Bold, Courier-Bold, ZapfDingbats, Times-Italic, Helvetica-Oblique, Courier-Oblique, Times-BoldItalic, Helvetica-BoldOblique, Courier-BoldOblique
These fonts, or their font metrics and suitable substitution fonts, shall be available to the conforming reader.
NOTE The character sets and encodings for these fonts are listed in Annex D. The font metrics files for the standard 14 fonts are available from the ASN Web site (see the Bibliography). For more information on font metrics, see Adobe Technical Note #5004, Adobe Font Metrics File Format Specification.
In Annex D you'll find ł but not ą, ć, or ś.

DITA OT printing '#' in stead of Chinese characters in PDF

I am very new to DITA OT. Downloaded the DITA-OT1.5.4_full_easy_install_bin and playing around with it. I'm trying to print few characters in Simplified Chinese (zh-CN) into a PDF. I see that the characters are printed correctly in XHTML but in PDF they are printed as "#".
In the command line I see this - "Warning: Glyph "?" (0x611f) not available in font "Helvetica".
Here are the things I have tried so far:
In demo\fo\fop\conf\fop.xconf :
<fonts>
<font kerning="yes"
embed-url="file:///C:/Windows/Fonts/simsun.ttc"
embedding-mode="subset" encoding-mode="cid">
<font-triplet name="SimSun" style="normal" weight="normal"/>
</font>
<auto-detect/>
<directory recursive="true">C:\Windows\Fonts</directory>
</fonts>
In demo\fo\cfg\fo\attrs\custom.xsl :
<xsl:attribute-set name="__fo__root">
<xsl:attribute name="font-family">SimSun</xsl:attribute>
</xsl:attribute-set>
In demo\fo\cfg\fo\font-mapping.xml added this block for Sans, Serif & Monospaced logical fonts:
<physical-font char-set="Simplified Chinese">
<font-face>SimSun</font-face>
</physical-font>
In samples\concepts\garageconceptsoverview.xml :
<shortdesc xml:lang="zh_CN">職業道德感.</shortdesc>
And this is the command I am using to generate the PDF:
ant -Dargs.input=samples\hierarchy.ditamap -Dtranstype=pdf
Any help would be appreciated. Thanks.
[EDIT]
I see that the topic.fo file which gets generated in temp folder, does contain the Chinese characters correctly. Like this:
<fo:block font-size="10pt" keep-with-next.within-page="5" start-indent="25pt">職業道德感.</fo:block>
But I do not see the font related information anywhere in this document.
First of all you should set the "xml:lang='zh_CN'" attribute on the root elements for all DITA topics and maps. This will help the DITA OT publishing decide the language to use for static texts like "Table X" and also to decide on the charset to use for the font mappings.
Then you should run the publishing by setting the parameter "clean.temp" parameter to "no".
After the publishing you can look in the temporary files folder for a file called "topic.fo" and look inside it to see what font families are used.
Because even if you set a font on the root element, there are other places in the XSL-FO file where you have font families set explicitly.
So instead of setting a font on the XSL-FO root element you should edit the font mappings XML file and for each of the logical fonts "Sans" and "Serif" you should configure the actual font family to use for the Chinese charset, something like:
<logical-font name="Sans">
.........
<physical-font char-set="Simplified Chinese">
<font-face>SimSun</font-face>
</physical-font>
......
</logical-font>
More about how the font mappings work:
https://www.oxygenxml.com/doc/versions/17.0/ug-editor/#topics/DITA-map-set-font-Apache-FOP.html
Update:
If you insist of having that XSLT customization which sets the "SimSun" font as a font family on the root element, then in the font-mappings.xml you need to define a new mapping for your alias:
<aliases>
<alias name="SimSun">SimSun</alias>
</aliases>
and then map the logical font to a physical one in the same font-mappings.xml:
<logical-font name="SimSun">
<physical-font char-set="Simplified Chinese">
<font-face>SimSun</font-face>
</physical-font>
</logical-font>
0x611f , this character is a chinese character (感), helvetica is an europe font , so no this character in the "helvetica" font. You can search this "helvetica" font loaction, in this position your content(ditamap/dita) should use chinese font, not europe font. You must find that arritbute that include the [font-famliy=helvetical], modify in your own plugin [SimSun, Helvetical].
Sorry, I cannot answer your question, but you should definetely try a newer DITA-OT from http://dita-ot.github.io/. Your DITA-OT is not supported anymore. Maybe your problem fades away using the latest release.

What are the BCP-47 voice codes available for iOS 7 AVSpeechSynthesisVoice?

Today I'm very exited about the speech synthesis function is available in iOS7.
I want to select the male voice(default in OSX, called alex).
I don't know what's the BCP-47 code for him, and BTW how to get the full list of all voice code
iOS 8 added Hebrew, no new languages were added in iOS 9 to 12:
ar-SA Arabic Saudi Arabia
cs-CZ Czech Czech Republic
da-DK Danish Denmark
de-DE German Germany
el-GR Modern Greek Greece
en-AU English Australia
en-GB English United Kingdom
en-IE English Ireland
en-US English United States
en-ZA English South Africa
es-ES Spanish Spain
es-MX Spanish Mexico
fi-FI Finnish Finland
fr-CA French Canada
fr-FR French France
he-IL Hebrew Israel
hi-IN Hindi India
hu-HU Hungarian Hungary
id-ID Indonesian Indonesia
it-IT Italian Italy
ja-JP Japanese Japan
ko-KR Korean Republic of Korea
nl-BE Dutch Belgium
nl-NL Dutch Netherlands
no-NO Norwegian Norway
pl-PL Polish Poland
pt-BR Portuguese Brazil
pt-PT Portuguese Portugal
ro-RO Romanian Romania
ru-RU Russian Russian Federation
sk-SK Slovak Slovakia
sv-SE Swedish Sweden
th-TH Thai Thailand
tr-TR Turkish Turkey
zh-CN Chinese China
zh-HK Chinese Hong Kong
zh-TW Chinese Taiwan
edit: Here is how to print the above in Swift:
func printLanguages() {
AVSpeechSynthesisVoice.speechVoices().forEach { (voice) in
let language = Locale.current.localizedString(forLanguageCode: voice.language)!
let components = Locale.components(fromIdentifier: voice.language)
let country = Locale.current.localizedString(forRegionCode: components["kCFLocaleCountryCodeKey"]!)!
print("\(voice.language) \t \(language) \t\t \(country)")
}
}
You need to import AVFoundation
Here's how to get the BCP-47 codes of the available voices:
for (AVSpeechSynthesisVoice *voice in [AVSpeechSynthesisVoice speechVoices]) {
NSLog(#"%#", voice.language);
}
Alex's locale is "English - United States" (en-US), as you can see in the Dictation & Speech control panel on OS X. (Click "Customize..." in the "System Voice" drop down.)
As of iOS 7.1 there are 36 voices for the following BCP-47 codes:
ar-SA
cs-CZ
da-DK
de-DE
el-GR
en-AU
en-GB
en-IE
en-US
en-ZA
es-ES
es-MX
fi-FI
fr-CA
fr-FR
hi-IN
hu-HU
id-ID
it-IT
ja-JP
ko-KR
nl-BE
nl-NL
no-NO
pl-PL
pt-BR
pt-PT
ro-RO
ru-RU
sk-SK
sv-SE
th-TH
tr-TR
zh-CN
zh-HK
zh-TW