gem - search Chinese with pg_search - ruby-on-rails-3

I am using pg_search. it's fine with English, but I cannot use it to search Chinese content. When I set Chinese as input:
NOTICE: text-search query contains only stop words or doesn't contain lexemes, ignored
I can use ActiveRecord API where to search Chinese, so I think it's not a problem about my db setting.
How should I config pg_search to work with Chinese?

Related

UTF-8 vs. ASCII-8BIT Encoding Inconsistency in Ruby on Rails

I know there are many questions about rails and encoding out there, but I haven't been able to find anything about this specific question.
I have a Ruby on Rails application using rails 3.1.3 and running under Jruby 1.6.7. We have support for both English and French - and we use the I18n library/gem to accomplish this.
Sample translation file parts:
#---- config/locales/en.yml ----
en:
button_label_verify: "Verify"
#---- config/locales/fr.yml ----
fr:
button_label_verify: "Vérifier"
In certain cases I am getting the following encoding error:
Internal Server Error: Encoding::CompatibilityError incompatible character encodings: UTF-8 and ASCII-8BIT
Case 1:
#---- app/views/_view_page.html.erb ----
.....
<h3><%= get_button_label() %></h3>
....
#---- app/helpers/page_helper.rb ----
def get_button_label
return I18n.t(:button_label_verify)
end
This works - there are no encoding errors and translations between French and English work just fine.
Case 2:
#---- app/views/_view_page.html.erb ----
.....
<h3><%= get_button_label() %></h3>
....
#---- app/helpers/page_helper.rb ----
def get_button_label
return "#{I18n.t(:button_label_verify)}"
end
This however does not work. The only difference is the value being returned includes strings with computed code in the string as opposed to something like
return "string " + I18n.t(:button_label_verify)
Note: The above causes no errors either - the encoding issue is only when the computed I18n translation is in the quotes.
Case 3:
#---- app/views/_view_page.html.erb ----
.....
<h3><%= "#{I18n.t(:button_label_verify)}" %></h3>
....
This causes no error... so the problem seems to somehow be related to the dynamic code (with French characters) within the string, on top of printing out a string returned from a helper function.
I know how to work around this/fix it - but what I am wondering is if anyone can provide some insight into why it is this way - is it this way for any good reason? IMO, when you get to low level - printing out a string is printing out a string, so I don't understand how one way causes and error and another way doesn't.
Putting
#encoding: utf-8
at the top of your files containing ascii-extended characters should fix encoding related issues (at elast the one coming from project files ...)
I couldn't tell why it doesn't work on a helper when using interpolation though ...
Sometimes you need to set the KCODE environment variable for the file (this is important for ruby 1.8 compatibility):
# encoding: UTF-8
$KCODE = 'UTF8' unless RUBY_VERSION >= '1.9'
It could also be that your files are not encoded in UTF-8. For that you need more than just the plaintext header. In Eclipse it is hidden under Preferences -> General -> Editors -> Spelling and for Notepad and most Windows programs when when you Save As the file. The enca command is one way of doing it on Linux but I'm sure there are others. I can't count the times I have seen a file say it is UTF-8 but it is actually some other encoding because UTF-8 functions like ASCII for 8-bit characters so you don't often notice the problem until you check the headers in a HEX editor.
Please take some time to read about file encoding:
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets
Why perl doesn't use UTF-8
Every character has a story #4: U+feff (alternate title: UTF-8 is the BOM, dude!)
This is incredibly important to get right and it can save you a lot of pain later when you port to East Asian languages (you should always plan to do this!)

Rails utf-8 problem

I there, I'm new to ruby (and rails) and having som problems when using Swedish letters in strings. In my action a create a instance variable like this:
#title = "Välkommen"
And I get the following error:
invalid multibyte char (US-ASCII)
syntax error, unexpected $end, expecting keyword_end
#title = "Välkommen"
^
What's happening?
EDIT: If I add:
# coding: utf-8
at the top of my controller it works. Why is that and how can I slove this "issue"?
See Joel spolsky's article "The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)".
To quote the part that answers this questions concisely
The Single Most Important Fact About Encodings
If you completely forget everything I just explained, please remember
one extremely important fact. It does not make sense to have a string
without knowing what encoding it uses. You can no longer stick your
head in the sand and pretend that "plain" text is ASCII.
This is why you must tell ruby what encoding is used in your file. Since the encoding is not marked in some sort of metadata associated with your file, some software assumed ASCII until it knows better. Ruby 1.9 probably does so until your comment when it will stop, and restart reading the file now decoding it as utf-8.
Obviously, if you used some other Unicode encoding or some more local encoding for your ruby file, you would need to change the comment to indicate the correct encoding.
The "magic comment" in Ruby 1.9 (on which Rails 3 is based) tells the interpreter what encoding to expect. It is important because in Ruby 1.9, every string has an encoding. Prior to 1.9, every string was just a sequence of bytes.
A very good description of the issue is in James Gray's series of blog posts on Ruby and Unicode. The one that is exactly relevant to your question is http://blog.grayproductions.net/articles/ruby_19s_three_default_encodings (but see the others because they are very good).
The important line from the article:
The first is the main rule of source Encodings: source files receive a US-ASCII Encoding, unless you say otherwise.
There are several places that can cause problems with utf-8 encoding.
but some tricks are to solve this problem:
make sure that every file in your project is utf-8 based (if you
are using rad rails, this is simple to accomplish: mark your project,
select properties, in the "text-file-encoding" box, select "other:
utf-8")
Be sure to put in your strange "å,ä,ö" characters in your files again
or you'll get a mysql error, because it will change your "å,ä,ö" to a
"square" (unknown character)
in your databases.yml set for each server environment (in this
example "development" with mysql)
development:
adapter: mysql
encoding: utf8
set a before filter in your application controller
(application.rb):
class ApplicationController < ActionController::Base
before_filter :set_charset
def set_charset
#headers["Content-Type"] = "text/html; charset=utf-8"
end
end
be sure to set the encoding to utf-8 in your mysql (I've only used
mysql.. so I don't know about other databases) for every table. If you
use mySQL Administrator you can do like this: edit table, press the
"table option" tab, change charset to "utf8" and collation to
"utf8_general_ci"
( Courtsey : kombatsanta )

Ruby on Rails escape umlauts in url

I try to post some parameters containing umlauts to a url (PHP Script). So I've to escape the parameters. But Ruby returns me an unexpected string.
PHP:
urlencode("äöü");
output: %E4%F6%FC
and RoR:
URI.escape("äöü")
output: %C3%A4%C3%B6%C3%BC
or:
CGI.escape("äöü")
output: %C3%A4%C3%B6%C3%BC
I'm working on Rails 3.0.5 and Ruby 1.9.2 and my application is setup for UTF-8. Where is my fault or what should I do?
Thanx andi
Welcome to the wonderful world of String encodings. As you noted, Ruby is configured for UTF-8, whereas your installation of PHP looks like it's trying to encode using ISO 8859-1.
To solve this, you need to make sure both of your scripts are operating using the same encoding, or explicitly convert your URL paramaters from UTF-8 to ISO 8859-1.
Maybe you should use something like this:
CGI.escape("äöü")
If you got an error try to require 'cgi' before.

rails3 globalize3 migrate_data problem

data migrates but all cyrillic symbols are replaced with "?". Everything allright with latin sybols.
Fixed it with setting charset in mysql manually.

Inserting special character in Redmine wiki page

I'm using Redmine and I'm trying to insert the special character | inside a table in a Redmine wiki page. I don't want this character to be parsed as a column separator.
I've achieved this by doing a <code>|</code> around this character, but I don't want to use the code tag, since this character will gain code attributes, namely the courier new font.
Is there a tag for displaying plain text and avoid the parsing from the Redmine wiki engine?
I'm reading the redmine wiki formatting documentation but it is very poor and points me to textile formatting which doesn't seem to include this special case.
I could not get the exclimation point to work, but this works for me.
<notextile>|</notextile>
The only way I found out to overcome this problem is to insert the HTML code for the character I want to isolate. For instance, instead of putting an underscore and make the wiki think I'm starting an italic word, I have to put the HTML code for it:
_
Example:
this is a _test - _text comment here_
Without the underscore code (_) redmine wiki engine will think that italic starts at test and this is the wrong result:
this is a test - text comment here
So, putting the ASCII code for the underscore corrects this problem. Unfortunately, this parsing is not very clever (yet I hope).
Here is a link for an ASCII code table with many symbols and characters:
http://www.ascii.cl/htmlcodes.htm