where can I find smalltalk japanese language support package - smalltalk

Friend, I am looking for a japanese language support package for squeak4.1. Can you give some hint on the topics? thanks first!

Try here.
Description: An install script to
set-up Japanese fonts, locale, and
some localization patches. This is for
developers who use latest Squeak but
need to handle Japanese characters.

Related

ML-KIT Text recognision for non english language

I am trying to build a Japanese ocr using Google ML-Kit.
It is working for english but for Japanese I don't have the luck.
Can you please provide an example to do the same?
Thank you!
The current Text Recognition in ML Kit does not support non-English languages, but we are adding more languages support including Japan, which should be available in a couple of releases.

How to Build Translation System

I want build translate system to translate text from english to arabic.
what kind of language should I use?
also what steps should I follow to build this system?
You should before think to an algorithm for translation, after you can use whatever language you like denpend on where you will use the application (web, phone, terminal,..), i think translation systems works with graph implementation.

carrot2 api not support japanese language

I am trying to use carrot2 API to cluster documents in japanese language. It throws out this WARN:
org.carrot2.text.linguistic.DefaultTokenizerFactory: Tokenizer for Japanese (ja) is not available. This may degrade clustering quality of Japanese content.
Hence, the clustering process failed and all docs belong to "other topic" cluster.
Is there any help to solve this problem?
Thanks in advance.
The open source algorithms available in Carrot2 unfortunately do not support Japanese. The constant was added to cover the possible future support for Japanese.
Alternatively, you can try running Carrot2 with a customized linguistic pipeline, the UsingCustomLanguageModel example class in Carrot2 Java API distribution shows how to do it.

Internationalization string testing

Some people using look-alike Unicode symbols to replace English characters to test the internationalization, e.g. "Test" is replaced as "Ťėşŧ". Is there a wellknown name for this language/culture? Are there utils, keyboard layouts, translation tools for this "language"?
The name of this technique is Pseudolocalization, see the Wikipedia article here: http://en.wikipedia.org/wiki/Pseudolocalization
Windows Vista comes with three Pseudo-Locales for testing. The "Using Pseudo-Locales for Localization Testing" MSDN article may be a good place to start.
Beyond this, any tool (beyond what you would use for regular localization, such as a translator) would depend on what platform you are developing for (and so how your data is stored).
As for keyboard layouts - any will do. But don't forget about IMEs.

Analyzer for Russian language in Lucene and Lucene.Net

Lucene has quite poor support for Russian language.
RussianAnalyzer (part of lucene-contrib) is of very low quality.
RussianStemmer module for Snowball is even worse. It does not recognize Russian text in Unicode strings, apparently assuming that some bizarre mix of Unicode and KOI8-R must be used instead.
Do you know any better solutions?
My answer is probably too late, but for the record, I've found analyzers from AOT project much better then those shipped with Lucene.
I used http://code.google.com/p/russianmorphology/
If all else fails, use Sphinx
Project http://code.google.com/p/russianmorphology/ moved to https://github.com/AKuznetsov/russianmorphology. Please take into account the new hosting of this project.
That's the beauty of open source. You have the source code, so if the current implementations don't work for you, you can always create your own or even better, extend the existing ones.
A good start would be the "Lucene in Action" book.