Are all combinations of language codes and regions in the language-subtag-registry valid?

Are all combinations of language codes and regions in the language-subtag-registry valid? - lang

RFC 5646 (https://www.rfc-editor.org/rfc/rfc5646.html) and IANA language subtag registry (https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry) describe and list the language and region codes that make up the tags for identifying languages, respectfully.
My question is:
is every combination of language subtag and region subtag valid?
For example, these are obvious variants of English:
en-GB is "English as spoken in Great Britain"
en-CA is "English as spoken in Canada"
en-US is "English as spoken in United States of America"
But is this valid, or is this combination not allowed because it is not recognized as a valid variant of English?
en-CN is "English as spoken in China"
Thanks!

Related

Use of Preferred-Value in Language Tag Records of Type "Variant" (RFC 5646)

In RFC 5646, Tags for Identifying Languages, § 3.1.2 Record and Field Definitions, the following explanation is given for the semantics of the Preferred-Value field when appearing in a record whose Type is "variant":
For fields of type 'script', 'region', or 'variant', 'Preferred-Value' contains the subtag of the same type that is preferred for forming the language tag.
My initial interpretation of this was that if the Type of the record is variant, then the value of a Preferred-Value is also a variant — "a subtag of the same type". In other words, I read "of the same type" as "of the same type as the record itself".
However, there are records in the current version of the language tag registry (2018-04-23 at the time I write this — it doesn’t seem there are versioned links) which do not match this interpretation. For example:
Type: variant
Subtag: arevela
Description: Eastern Armenian
Added: 2006-09-18
Deprecated: 2018-03-24
Preferred-Value: hy
Prefix: hy
The Preferred-Value here is not a variant — a variant must be either 5-8 alphanumeric ASCII characters or 1 digit plus three alphanumeric characters. In this case in particular, it’s clear that it’s referring to the Armenian language (the first segment of a language tag) rather than to a variant.
However, when looking through other entries, most Preferred-Value values do conform to my initial interpretation. For example:
Type: region
Subtag: YD
Description: Democratic Yemen
Added: 2005-10-16
Deprecated: 1990-08-14
Preferred-Value: YE
Here, Preferred-Value does appear to be another region code. The rules for script/region/variant types are given together — the Preferred-Value is the "same type" for all of these. If for a region record a "same type Preferred-Value" means "also a region", how is it that for a variant record Preferred-Value may point at a different type? More importantly, if this is possible, is the only way to determine the type of the Preferred-Value field to test its grammar?

You are right. That arevela entry does not conform to the registry specification. It seems as though they noticed this; the registry as of 2021-02-23 has a new entry for arevela without that Preferred-Value. It instead has the comment "Preferred tag is hy".
Your comments also seems to be correct (and my initial interpretation of the section in the spec was wrong). They've changed those entries too, so all extlangs now have a Preferred-Value that is a primary language subtag identical to the extlang.

List All the countries in a specific language

I need to list all the countries in the French Language.
all_countries = self.env['res.country'].search([])
for country in all_countries:
_logger.error(country.name)
With this code i get the Country English Name.How to get the French one?

As far as I know, there is no reference between country and languages. You have to get the list by external modules. Try pycountry (https://pypi.python.org/pypi/pycountry/0.12.1), get the list of country code (by language code fr-*), then search for matches in res.country.
Or you can override res.country and add language reference your self :)

How to get the user country always in English?

In my app I have a system with geolocation that compares the user's country with a string in my code. Here is something similar:
if (UserLocation.country == #"Switzerland") {
//country is Switzerland
}
This system works fine if the user language of the device is English.
Some users have reported that it isn't working and that they have the device in German or Italian.
How can I force the device to give me the string of the country always in English?

It is generally not ideal to use natural language (i.e., the language we speak and write as humans) to store information that is inherently symbolic.
You found a good example yourself that arises when your software travels, as it does with modern devices used world-wide.
Another reason could be that even within the same language there may be several widely-used ways to refer to the same "object" or topic. Just think about America, which can be referred to as America, US, USA and probably a couple of other names, too. Or the UK, which is referred to as both the United Kingdom (UK) and Great Britain (GB). Or even for some simple object properties: Is it color or colour? I am sure that more examples can be found for other languages - especially if they are spoken in more than one country.
On top of that, natural language is also prone to misspellings, typing mistakes, use of abbreviations and changes to spelling standards over time.
Hence, a better solution would be to use a language-independent method.
The commonly accepted practice for country names is to use ISO country codes ("CH" for Switzerland, "DE" for Germany, etc., in case you use the 2-letter versions.)
In your database (which can be as simple as a plist file) you would create a field called for instance countrycode of length 2 characters which should take a value of "US", "CH", "DE", "FR", etc. following the ISO 3166 standard. You can also use 3-letter ISO country codes, but it seems that the most commonly used is the 2-letter one.
Then, for UI purposes you would keep another table, which could be a .strings file for localization with the natural language names of the countries:
US = United States;
DE = Germany;
CH = Switzerland;
/* etc. */
On the screen you can translate to natural language:
NSString *isoCountryCode = #"CH"; // Or however you want to set it
NSString *countryName = NSLocalizedString(isoCountryCode, #"Default string if country name not found.");
// Now you can set the country name in human-readable form in the UI
myCountryNameLabel.text = countryName;
In the rest of your code, i.e., the code that handles the data before showing it to the user in the UI, you only use the ISO codes to distinguish countries. Some people refer to this part of the code as the business logic, and as a simple example it could look something like this:
if ( [UserLocation.countryCode isEqualToString: #"CH"] ) {
// Do things needed for Switzerland
}
Internationalization is an extensive topic, and you can find Apple's documentation on it here for iOS. The OS X version is here - but they will be largely identical.

I used Google Places and made a request to this url
"https://maps.googleapis.com/maps/api/geocode/json?latlng=\(userLocation.latitude),\(userLocation.longitude)&key=\(self.googlePlacesApiKey)&language=en&result_type=country"
so it always gives me the results in english independently from the user's language or locale settings
you can change change the information you receive with result_type parameters. Full guide here:
https://developers.google.com/maps/documentation/geocoding/intro#Viewports

Is it necessary to translate the direction character in a latitude/longitude?

Possibly a subjective question this (although I hope not). I am translating an application into European Spanish. This application provides on-screen latitude and longitude display.
When displaying the direction of a given longitude in English, an example might be:
10° W 10' 2.42"
However, the word West in Spanish translates to Oeste. Is it convention to leave the character in the longitude in English or translate it to Spanish, like so:
10° O 10' 2.42"
Personally I feel that as a maritime standard it needs no translations, but if anyone can point me to an example where this is not true that would be much appreciated.
Thanks

Most languages do translate them – that's the case with Finnish, Swedish and German, at least. I'd still keep their English names if it was really a maritime software, since most people on special fields are most accustomed to see the English characters denoting the hemispheres. Actually, some translations might also cause confusion if presented to an unfamiliar user, eg. "south" in Finnish would be shortened as E, which would be shortened version for east in the original version.
Other option worth considering would be using signed numbers, ie. -10° for 10° W. That's quite widely used and very unambiguous. If you aren't running out of space, I'd use the real names of the hemispheres instead of abbreviations, to stay on the safe side.

What is The Turkey Test?

I came across the word 'The Turkey Test' while learning about code testing. I don't know really what it means.
What is Turkey Test? Why is it called so?

The Turkey problem is related to software internationalization or simply to its misbehavior in various language cultures.
In various countries there are different standards, for example for writing dates (14.04.2008 in Turkey and 4/14/2008 in US), numbers (i.e. 123,45 in Poland and 123.45 in USA) and rules about character uppercasing (like in Turkey with letters i, I and ı).
As Jeff Moser pointed below one such problem was pointed out by a Turkish user who found a bug in the ToUpper() function. There are more details in comments below.
However the problem is not limited to Turkey and to string conversions.
For example, in Poland and many other countries, dates and numbers are also written in a different manner.
Some links from a Google search for the Turkey Test :
Does Your Code Pass The Turkey Test?
by Jeff Moser
What's Wrong With Turkey?
by Jeff Atwood

Here is described the turkey test
Forget about Turkey, this won't even pass in the USA. You need a case insensitive compare. So you try:
String.Compare(string,string,bool ignoreCase):
....
Do any of these pass "The Turkey Test?"
Not a chance!
Reason: You've been hit with the "Turkish I" problem.
As discussed by lots and lots of people, the "I" in Turkish behaves differently than in most languages. Per the Unicode standard, our lowercase "i" becomes "İ" (U+0130 "Latin Capital Letter I With Dot Above") when it moves to uppercase. Similarly, our uppercase "I" becomes "ı" (U+0131 "Latin Small Letter Dotless I") when it moves to lowercase.

We write dates smaller to bigger like dd.MM.yyyy: 28.10.2010
We use '.'(dot) for thousands separator, and ','(comma) for decimal separator: 4.567,9
We have ö=>Ö, ç=>Ç, ş=>Ş, ğ=>Ğ, ü=>Ü, and most importantly ı=>I and i => İ; in other words, lower case of upper I is dotless and upper case of lower i is dotted.
People may have very stressful times because of meaningless errors caused by the above rules.
If your code properly runs in Turkey, it'll probably work anywhere.

The so called "Turkey Test" is related to Software internationalization. One problem of globalization/internationalization are that date and time formats in different cultures can differ on many levels (day/month/year order, date separator etc).
Also, Turkey has some special rules for capitalization, which can lead to problems. For example, the Turkish "i" character is a common problem for many programs which capitalize it in a wrong way.

The link provided by #Luixv gives a comprehensive description of the issue.
The summary is that if your going to test your code on only one non-English locale, test it on Turkish.
This is because the Turkish has instances of most edge cases you are likely to encounter with localization, including "unusual" format strings and non-standard characters (such as a different capitalization rules for i).

Jeff Atwood has a blog article on same which is the first place I came across it myself.
in summary attempting to run your application under a Turkish Locale is an excellent test
of your I18n.
here's jeffs article

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas