in my app i need to open a website with the corresponding TLD for that country. Lets say google.com, google.de, etc...
But i don't know which country codes the're specifically using in NSLocale's dict. Can i assume that the lowercase version of NSLocaleCountryCode can be appended as TLD?
Regards, Erik
From Locales Programming Guide:
The region code is defined by ISO
3166-1
ISO 3166-1 is not equivalent to top level domains, at least not in all cases. For instance: .co.uk ≠ GB
On the other hand, there are only six or so exceptions. See the Wikipedia entry.
Related
Some people will reply that domain names are not case-sensitive. In the new Unicode world this is no longer true.
(Source)
I thought one of the steps in the Unicode > Punycode conversion was a "normalisation", which rendered domain names lower case.
For old-fashioned ASCII-based domain names, Yes, domain names have been and continue to be case-insensitive.
To quote RFC 1035, DOMAIN NAMES - IMPLEMENTATION AND SPECIFICATION:
Note that while upper and lower case letters are allowed in domain names, no significance is attached to the case. That is, two names with the same spelling but different case are to be treated as if identical.
For example, all of these represent the same domain:
example.com
Example.com
EXAMPLE.COM
EXampLE.com
In modern DNS, we now have Internationalized Domain Names (IDN) which allows Unicode characters. The problem is that defining upper- and lowercase can be tricky in some languages and character sets beyond ASCII (Unicode is a superset of US-ASCII).
The intent of domain names is to be case-insensitive, but there may be complications with particular characters in particular scripts of particular human languages. So there is no simple YES or NO answer to your question.
If using non-ASCII domain names, you should read:
Internationalized domain name on Wikipedia
Domain Name System (DNS) Case Insensitivity Clarification Official spec (IETF RFC 4343)
WRONG: URLs are still case insensitive, even for IDN.
CORRECTION:
The question was about IDN: "Are IDN domain names case-sensitive?"
My initial answer is wrong, and does not clearly answer the question.
It brings URLs into the mix.
The domain name part (IDN) of a URL is case-insensitive.
The other elements might be case-insensitive or not. It depends on many things, and in general is not predictable.
For instance the path part would normally depend on the OS or even the file system hosting the site (on MacOS you can format the drive as case insensitive or not)
But these days you can have some of these paths "hooked" to answer RESTfull APIs.
So it depends on how the "hook" is done.
Similar for other elements (user, password, parameters, parameter values)
I am building a small application for an english speaking client in Japan. As part of the app, users need to be able to enter their address. Unfortunately, I can't find any reference for how addresses are usually handled in an online form.
I know that there are different combinations of wards/prefectures/cities; do these all usually have their own field in a database? Is it standard for all of that to go into a general "city" type of field? What's the standard UI for this sort of thing?
The Universal Postal Union has compiled info on address formats in different countries. See also an unofficial guide to postal addresses.
But as a rule, internationalization of software typically means that for postal addresses, you avoid imposing any specific format. Instead, you would use a free-form text input area, of sufficient size. There are often many forms of addresses used in a country (and Japan is no exception), and normally you need not enforce any specific format – instead, you expect people to know their own address and how to enter it so that postal services can understand it.
it depends on what you have to do with the address:
if you have to:
check for obligatory fields
validate fields, or
query for city, prefecture, postal code, etc.
then you should use separate fields. UI: a form with text-inputs (and maybe even menus).
do not use more fields than necessary, so if you don't have any of the mentioned needs, just use a text-field (UI: textarea).
The first part of a Japanese address is easy: Todofuken will either be 2 or 3 characters, followed by either "都","道","府" or "県". Where it gets tricky is the rest of the address since smaller areas don't always divide their cities neatly.
What I've seen to make this easier is using the postal code to render the address. The bad news is that I haven't seen any of this in Ruby but I have seen it in other languages so hopefully this will help.
This site is only in Japanese, but maybe you can download the code and check it out:
http://www.kawa.net/works/ajax/ajaxzip2/ajaxzip2.html
There's also this add-in for Excel that converts addresses. The code may be helpful to you:
http://office.microsoft.com/ja-jp/excel-help/HP010077514.aspx
Hope this helps.
I'm working on an internationalized database application that supports multiple locales in a single instance. When international users sort data in the applications built on top of the database, the database theoretically sorts the data using a collation appropriate to the locale associated with the data the user is viewing.
I'm trying to find sorted lists of words that meet two criteria:
the sorted order follows the collation rules for the locale
the words listed will allow me to exercise most / all of the specific collation rules for the locale
I'm having trouble finding such trusted test data. Are such sort-testing datasets currently available, and if so, what / where are they?
"words.en.txt" is an example text file containing American English text:
Andrew
Brian
Chris
Zachary
I am planning on loading the list of words into my database in randomized order, and checking to see if sorting the list conforms to the original input.
Because I am not fluent in any language other than English, I do not know how to create sample datasets like the following sample one in French (call it "words.fr.txt"):
cote
côte
coté
côté
The French prefer diacritical marks to be ordered right to left. If you sorted that using code-point order, it likely comes out like this (which is an incorrect collation):
cote
coté
côte
côté
Thank you for the help,
Chris
Here's what I found.
The Unicode Common Locale Data Repository (CLDR) is pretty much the authority on collations for international text. I was able to find several lists of words conforming to the rules found in CLDR in the ICU Project's ICU Demonstration - Locale Explorer tool. It turns out that ICU (International Components for Unicode) uses CLDR rules to help solve common internationalization problems. It's a great library; check it out.
In some cases, it was useful to construct some nonsense terms by reverse-engineering the CLDR rules directly. Search engines available in the United States were not suited for finding foreign terms with the case/diacritic/other nuances I was interested in for this testing (in retrospect, I wonder if international search engines would have been better-suited for this task).
I want to know the list of all countries and the encryption standard not allowed for that country.
Example: For some country encryption is not allowed. and for some country encryption level should not be grater than 64 bit.
Thanks
Sunil Kumar Sahoo
The resource you are looking for is the Crypto Law Survey.
I do not believe such a list exists.
You need to research for each country and build such a list on your own. The best way is of course let a lawyer investigate that for you.
Anyway, what do you need it for? If it's a web application that resides on servers in a country X you only need to comply with that country requirements. The fact that people can access your application from anywhere in the world will not be your concern.
the ones I know of
Russia
Ukraine
Crimea
Egypt
Kazakhstan
Israel
Turkey
China
Pakistan
Check with a lawyer (as this data would come from the laws of that specific country, which tend to change); then make a list, e.g.
<country id="US">
<bannedcipher type="rot13" />
</country>
and keep it up to date (that's the hard part). AFAIK, there's no reliable way to get this list programatically; also, the encryption allowed may vary according to the use (e.g. "In Elbonia, everybody except the military is hereby banned from using XOR").
If I have a commercial site belonging to a Japanese company which will use Katakana or Kanji (non ASCII characters) for the keyword they wish to obtain good search results in google, does it still matter to put the closest english word on the site DNS Name?
like:
if the search word is "homepage" in Katakana: ホームページ
Will the the DNS name have an impact on the results?
Is it better, does it have any effect having a DNS Name which includes "homepage"?
Thanks,
Ric
What name will bring higher hit counts is kind of an art, not a science.
Since the IDN (International Domain Names) support is still weak in a lot of tools I would think that a Japanese DNS name would bring less hits than an English one.
On another side, in my experience the content and proper tagging of the content is way-way more important than the domain name itself to attract traffic.
DNS international domains are translated to unreadable ascii, so I guess it doesn't make sense to use it for SEO.