What is The Turkey Test?

What is The Turkey Test? - testing

I came across the word 'The Turkey Test' while learning about code testing. I don't know really what it means.
What is Turkey Test? Why is it called so?

The Turkey problem is related to software internationalization or simply to its misbehavior in various language cultures.
In various countries there are different standards, for example for writing dates (14.04.2008 in Turkey and 4/14/2008 in US), numbers (i.e. 123,45 in Poland and 123.45 in USA) and rules about character uppercasing (like in Turkey with letters i, I and ı).
As Jeff Moser pointed below one such problem was pointed out by a Turkish user who found a bug in the ToUpper() function. There are more details in comments below.
However the problem is not limited to Turkey and to string conversions.
For example, in Poland and many other countries, dates and numbers are also written in a different manner.
Some links from a Google search for the Turkey Test :
Does Your Code Pass The Turkey Test?
by Jeff Moser
What's Wrong With Turkey?
by Jeff Atwood

Here is described the turkey test
Forget about Turkey, this won't even pass in the USA. You need a case insensitive compare. So you try:
String.Compare(string,string,bool ignoreCase):
....
Do any of these pass "The Turkey Test?"
Not a chance!
Reason: You've been hit with the "Turkish I" problem.
As discussed by lots and lots of people, the "I" in Turkish behaves differently than in most languages. Per the Unicode standard, our lowercase "i" becomes "İ" (U+0130 "Latin Capital Letter I With Dot Above") when it moves to uppercase. Similarly, our uppercase "I" becomes "ı" (U+0131 "Latin Small Letter Dotless I") when it moves to lowercase.

We write dates smaller to bigger like dd.MM.yyyy: 28.10.2010
We use '.'(dot) for thousands separator, and ','(comma) for decimal separator: 4.567,9
We have ö=>Ö, ç=>Ç, ş=>Ş, ğ=>Ğ, ü=>Ü, and most importantly ı=>I and i => İ; in other words, lower case of upper I is dotless and upper case of lower i is dotted.
People may have very stressful times because of meaningless errors caused by the above rules.
If your code properly runs in Turkey, it'll probably work anywhere.

The so called "Turkey Test" is related to Software internationalization. One problem of globalization/internationalization are that date and time formats in different cultures can differ on many levels (day/month/year order, date separator etc).
Also, Turkey has some special rules for capitalization, which can lead to problems. For example, the Turkish "i" character is a common problem for many programs which capitalize it in a wrong way.

The link provided by #Luixv gives a comprehensive description of the issue.
The summary is that if your going to test your code on only one non-English locale, test it on Turkish.
This is because the Turkish has instances of most edge cases you are likely to encounter with localization, including "unusual" format strings and non-standard characters (such as a different capitalization rules for i).

Jeff Atwood has a blog article on same which is the first place I came across it myself.
in summary attempting to run your application under a Turkish Locale is an excellent test
of your I18n.
here's jeffs article

Related

TTS microsoft.speech, best way to say a sentence fluently with language change

I need to say a sentence, with a german name in the sentence. To do so I used Microsoft speech with english, called the speakasync function to say the first part of the sentence, then changed Language to german, said the name, then went back to english and finished the sentence. this all works well, except that each time i call the speakasync function the is a 1 second pause. so I have 1 second pause before and after the name. can this be removed somehow? I would like to have no pause in between.
s.SetOutputToDefaultAudioDevice()
s.SelectVoice(myENGLISHvoice)
s.SpeakAsync("Next on the line is mr. ")
s.SelectVoice(myGERMANvoice)
s.SpeakAsync("Stefan Hanswurst")
s.SelectVoice(myENGLISHvoice)
s.SpeakAsync("Please stand up.")
Update, I have also tried this, with no success.. same problem:
pb.AppendSsmlMarkup("<voice xml:lang=""en-EN"">")
pb.AppendText("Next on the line is mr.")
pb.AppendSsmlMarkup("</voice>")
pb.AppendSsmlMarkup("<voice xml:lang=""de-DE"">")
pb.AppendText("Hansjörg Bratwurst ")
pb.AppendSsmlMarkup("</voice>")
pb.AppendSsmlMarkup("<voice xml:lang=""en-EN"">")
pb.AppendText("Please stand up.")
pb.AppendSsmlMarkup("</voice>")

In context of speech engines you usually avoid switching language during speech output, this is unusual since humans also simply stick to one pronounciation (see how Americans and Italiens pronounce coffee or Cappuccino for example).
Usually this is done by inserting pronounciation hints for "foreign" words into the language you currently generate output for. Just like Germans have to learn how to pronounce "Cappuccino" and it will still always have a German accent/specific to it.
See details for microsofts speech API here (search for "pronunciation"-> they have a spelling error on the page):
https://msdn.microsoft.com/en-us/library/hh378454(v=office.14).aspx

Is there an APL idiom to get a vector of all alphabetical characters?

I know you can get a character vector of all numbers with ∊⍕¨⍳10, but is there a platform independent idiom for getting a vector of all alphabetical characters, aside from manually typing 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'? I know that I can do ⎕AV[(⍳26)+(⎕AV⍳'a')-1] to get all lowercase characters (and uppercase by changing the 'a' to 'A') in Dyalog APL, but I presume the system variable ⎕AV isn't available in other environments.

Not really.
In Dyalog APL, what I generally do is use ⎕A for the uppercase characters and ⎕UCS 96+⍳26 for the lowercase characters. (And ⎕A,⎕UCS 96+⍳26 for the whole alphabet.)
⎕AV is usually present, but its contents are not standard. (For example, NARS2000's ⎕AV is different from Dyalog's.) By the way, in Dyalog ⎕AV is considered deprecated in favour of ⎕UCS. Any APL that implements ⎕UCS will do it the same way, because Unicode is a set standard.
If you want a guaranteed implementation-independent, readable way to define the alphabet I would indeed recommend to just store abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ in your workspace.
However, I would not recommend trying to write implementation-independent APL code to begin with. APL dialects are rather divergent, so this is decidedly nontrivial (if possible at all for complex code), and will be difficult to maintain.

Even though Quad names (⎕xxx) are usually case insensitive, MicroAPL distinguishes between ⎕A and ⎕a, so ⎕a,⎕A gives 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'.

Yes, in the latest versions*, write ⎕A,819⌶⎕A, for ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz.
Try it online!
Documentation.
* Latest builds of 14.1, and all versions of 15.0 and up.

Is it necessary to translate the direction character in a latitude/longitude?

Possibly a subjective question this (although I hope not). I am translating an application into European Spanish. This application provides on-screen latitude and longitude display.
When displaying the direction of a given longitude in English, an example might be:
10° W 10' 2.42"
However, the word West in Spanish translates to Oeste. Is it convention to leave the character in the longitude in English or translate it to Spanish, like so:
10° O 10' 2.42"
Personally I feel that as a maritime standard it needs no translations, but if anyone can point me to an example where this is not true that would be much appreciated.
Thanks

Most languages do translate them – that's the case with Finnish, Swedish and German, at least. I'd still keep their English names if it was really a maritime software, since most people on special fields are most accustomed to see the English characters denoting the hemispheres. Actually, some translations might also cause confusion if presented to an unfamiliar user, eg. "south" in Finnish would be shortened as E, which would be shortened version for east in the original version.
Other option worth considering would be using signed numbers, ie. -10° for 10° W. That's quite widely used and very unambiguous. If you aren't running out of space, I'd use the real names of the hemispheres instead of abbreviations, to stay on the safe side.

Are there any magic characters in Twilio <Say> verbs

Are there any other characters I can use in a <Say> verb to help the pronunciation (assuming I have too many options to just record them all as MP3s).
Thus far all I've found is hyphens to help with correctly detecting syllables:
Adgrok is pronounced "Addbrooke" but "Ad-grok" is pronounced correctly. "PagerDuty" is "pahdgerduty" but "pager-duty" is correct.
Capitals seem to be meaningless and spaces can introduce weirdness: "Mont Re Al." is pronounced "Mont Re Alabama"

Unfortunately, at this time there are no special punctuation marks that can help with pronunciation.

I use spaces and periods sometimes but other that that it can be hard when you are using trade names. Another trick is I wanted it to say mysite.com so i typed "mysite dot com" or spelling out numbers five for 5. I think a lot of it comes down to trial and error.

The standard for doing this is SSML (http://www.w3.org/TR/speech-synthesis/). While Twilio does not support this, Tropo does with their speech synthesis verb 'say'. So, you may do something like this:
answer
say "<speak> I like squirrels!.
I <prosody rate='-10%'>like squirrels!</prosody>
I <prosody rate='-30%'>like squirrels!</prosody>
I <prosody rate='-50%'>like squirrels!</prosody>
</speak>"
hangup

Handling Grammar / Spelling Issues in Translation Strings

We are currently implementing a Zend Framework Project, that needs to be translated in 6 different languages. We already have a pretty sophisticated translation system, based on Zend_Translate, which also handles variables in translation keys.
Our project has a new Turkish translator, and we are facing a new issue: Grammar, especially Turkish one. I noticed that this problem might be evident in every translation system and in most languages, so I posted a question here.
Question: Any ideas how to handle translations like:
Key: I have a[n] {fruit}
Variables: apple, banana
Result: I have an apple. I have a banana.
Key: Stimme für {user}[s] Einsendung
Variables: Paul, Markus
Result: Stimme für Pauls Einsendung,
Result: Stimme für Markus Einsendung
Anybody has a solution or idea for this? My only guess would be to avoid this by not using translations where these issues occur.
How do other platforms handle this?
Of course the translation system has no idea which type of word it is placing where in which type of Sentence. It only does some string replacements...
PS: Turkish is even more complicated:
For example, on a profile page, we have "Annie's Network". This should translate as "Annie'nin Aği".
If the first name ends in a vowel, the suffix will start with an n and look like "Annie'nin"
If the first name ends in a consonant, it will not have the first n, and look like "Kris'in"
If the last vowel is an a or ı, it will look like "Dan'ın"; or Seyma'nın"
If the last vowel is an o or u, it will look like "Davud'un"; or "Burcu'nun"
If the last vowel is an e or i, it will look like "Erin'in"; or "Efe'nin"
If the last vowel is an ö or ü, it will look like "Göz'ün'; or "Iminönü'nün"
If the last letter is a k (like the name "Basak"), it will look like "Basağın"; or "Eriğin"

It is actually very hard problem, as grammar rules are different even among languages from the same family. I don't think you could easily do anything for let's say Slavic languages...
However, if you want to solve this problem (because this is extra challenging) and you are looking for creative (cross inspiring) ways to do that, you might want to look into something called ChoiceFormat (example would be one from ICU Project) or you can look up GNU Gettext's solution for plural forms problem.

ICU (mentioned above) has a SelectFormat http://site.icu-project.org/design/formatting/select that may be of help- it's like a choice format but with arbitrary keywords. Also, it does have a PluralFormat which already has rules for many language's plural rules.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas