Rails: ActiveRecord db sort operation case insensitive - sql

I am trying to learn rails [by following the SAAS course in coursera] and working with simple Movie table using ActiveRecord.
I want to display all movies with title sorted. I would like it to be sorted case insensitively.
I tried doing it this way:
Movie.all(:conditions => ["lower(title) = ?", title.downcase],:order => "title DESC")
=>undefined local variable or method `title' for #<MoviesController:0xb4da9a8>
I think it doesnt recognise lower(title) .
Is this the best way to achieve case insesisitve sort ?
Thanks!

Use where and not all
Movie.where("lower(title) = ?", title.downcase).order("title DESC")
Don't really understand the sort though. Here you'll get all movies with lower title equalling to title.downcase. Everything is equal, how could you sort it by title desc ?
To sort reverse-alphabetically all movies by lowercase title :
Movie.order("lower(title) DESC").all

You have to do this:
Movie.order("lower(title) DESC").all

A more robust solution is to use arel nodes. I'd recommend defining a couple scopes on the Movie model:
scope :order_by_title, -> {
order(arel_table['title'].lower.desc)
}
scope :for_title, (title)-> {
where(arel_table['title'].lower.eq title.downcase)
}
and then call Movie.for_title(title).order_by_title
The advantage over other answers listed is that .for_title and .order_by_title won't break if you alias the title column or join to another table with a title column, and they are sql escaped.
Like rickypai mentioned, if you don't have an index on the column, the database will be slow. However, it's bad (normal) form to copy your data and apply a transform to another column, because then one column can become out of sync with the other. Unfortunately, earlier versions of mysql didn't allow for many alternatives other than triggers. After 5.7.5 you can use virtual generated columns to do this. Then in case insensitive cases you just use the generated column (which actually makes the ruby more straight forward).
Postgres has a bit more flexibility in this regard, and will let you make indexes on functions without having to reference a special column, or you can make the column a case insensitive column.

Having MySQL perform upper or lower case operation each time is quite expensive.
What I recommend is having a title column and a title_lower column. This way, you can easily display and sort with case insensitivity on the title_lower column without having MySQL perform upper or lower each time you sort.
Remember to index both or at least title_lower.

Related

How to simulate ActiveRecord Model.count.to_sql

I want to display the SQL used in a count. However, Model.count.to_sql will not work because count returns a FixNum that doesn't have a to_sql method. I think the simplest solution is to do this:
Model.where(nil).to_sql.sub(/SELECT.*FROM/, "SELECT COUNT(*) FROM")
This creates the same SQL as is used in Model.count, but is it going to cause a problem further down the line? For example, if I add a complicated where clause and some joins.
Is there a better way of doing this?
You can try
Model.select("count(*) as model_count").to_sql
You may want to dip into Arel:
Model.select(Arel.star.count).to_sql
ASIDE:
I find I often want to find sub counts, so I embed the count(*) into another query:
child_counts = ChildModel.select(Arel.star.count)
.where(Model.arel_attribute(:id).eq(
ChildModel.arel_attribute(:model_id)))
Model.select(Arel.star).select(child_counts.as("child_count"))
.order(:id).limit(10).to_sql
which then gives you all the child counts for each of the models:
SELECT *,
(
SELECT COUNT(*)
FROM "child_models"
WHERE "models"."id" = "child_models"."model_id"
) child_count
FROM "models"
ORDER BY "models"."id" ASC
LIMIT 10
Best of luck
UPDATE:
Not sure if you are trying to solve this in a generic way or not. Also not sure what kind of scopes you are using on your Model.
We do have a method that automatically calls a count for a query that is put into the ui layer. I found using count(:all) is more stable than the simple count, but sounds like that does not overlap your use case. Maybe you can improve your solution using the except clause that we use:
scope.except(:select, :includes, :references, :offset, :limit, :order)
.count(:all)
The where clause and the joins necessary for the where clause work just fine for us. We tend to want to keep the joins and where clause since that needs to be part of the count. While you definitely want to remove the includes (which should be removed by rails automatically in my opinion), but the references (much trickier especially in the case where it references a has_many and requires a distinct) that starts to throw a wrench in there. If you need to use references, you may be able to convert these over to a left_join.
You may want to double check the parameters that these "join" methods take. Some of them take table names and others take relation names. Later rails version have gotten better and take relation names - be sure you are looking at the docs for the right version of rails.
Also, in our case, we spend more time trying to get sub selects with more complicated relationships, we have to do some munging. Looks like we are not dealing with where clauses as much.
ref2

Rails / SQL: Find attributes with same value but different capitalization

This may be really basic, but I can't think of how to write a SQL query that would find strings that have the same characters but different capitalization.
The context I'm working on is a Rails 3.2 app. I have a simple Tag model with a Name attribute. I've inherited data for this model that did not store values case-insensitively, so some users input things like "Tree" while others input "tree" and now we have two tags that really should be one.
So, I'd like to do a query to find all these pairs so that I can go about merging them.
The only thing I can think of so far is to write a rake task that loops through them all and checks for matching values... something like:
pairs = []
Tag.all.each do |t|
other = Tag.where( 'name LIKE ?', t.name )
pairs << [t, other] if other
end
However, I'm not sure the above would work, or that it makes sense performance-wise. Is there a better way to write a SQL query that would find these matching pairs?
There is a question similar to this here
What you can do is take that answer a create a method in your model to do a case insensitive search. From what i've experience however is that ActiveRecord already does case insensitive search but just in case:
def self.insensitive_find_by_tag_name(name)
Tag.where("lower(name) = ? ", name.downcase)
end
and then to remove duplicate entries, you can do something like this
Tag.transaction! do
tags = Tag.insensitive_find_by_tag_name(name)
tags.last(tags.length() - 1).each do |tag|
tag.destroy
end
end
Call transaction just in case anything fails so the database will rollback. Grab all tags with the same name, then delete any extra entries. If you want the remaining tag entry to be lower case then you can do
tag = tags.first
tag.name = tag.name.downcase
tag.save!
I'm not super good at SQL, but I researched this a bit and found out that using the COLLATE clause can be used to make string operations case sensitive in SQL. (typically select distinct operations are case insensitive.)
so maybe you could try:
select distinct (name) COLLATE sql_latin1_general_cp1_cs_as
FROM (
... blah blah blah
Here is some documentation on collate:
http://dev.mysql.com/doc/refman/5.0/en/charset-collate.html
(assuming you're using mysql I guess)
Alternatively you could also reconfigure your database to be case sensitive via collate also. Then your current query might work unaltered
(assuming you have administrative permissions and ability to reconfigure)
You should use upper() or lower() functions to convert the names all to lower or upper case.
SELECT DISTINCT upper(name)
Or:
SELECT DISTINCT lower(name)
Source: http://www.postgresql.org/docs/9.1/static/functions-string.html
Another option (better for maintainability of code) is to use the CITEXT type, but to do this you have to modify your table structure: http://www.postgresql.org/docs/9.1/static/citext.html

In rails, is it possible to do a query that returns all the records with a minumum number of characters in a text or string column?

I ask because a thorough Google search returns no clue as to how to do this.
I am trying to pull an example of a column field which is rarely used and is unfortunately littered with newlines and dashes even in empty ones, so I can't just ask for ones that have data. I need to ask for a column that has at least 10-15 characters or something like this. I can also imagine this query being useful for validating pre-existing data. I know about the validator that does this, but I'm not trying to validate, I'm trying to search.
Thanks!
Seems ActiveRecord does not support this. But you can do it anyway like (Mysql example)
Model.where("CHAR_LENGTH(text_field) = ?", 10)
in Postgres the same should work but in documentation it says to use char_length()
Also what you could do is on saving the record store the size of the field with a callback
before_save {|r| r.text_field_size = r.text_field.size}
With this you can now query with that, wich will be DB agnostic.
Model.where(text_field_size: 10)
I think you'll have to write so part of the request in SQL.
For MySQL, use something like :
Model.where("CHAR_LENGTH(field_name) >= ?", min_length)

Ruby followers query

We have an SQL query in our Rails 3 app.
#followers returns an array of IDs of users following the current_user.
#followers = current_user.following
#feed_items = Micropost.where("belongs_to_id IN (?)", #followers)
Is there a more efficient way to do this query?
The query you have can't really be optimized anymore than it is. It could be made faster by adding an index to belongs_to_id (which you should almost always do for foreign keys anyway), but that doesn't change the actual query.
There is a cleaner way to write IN queries though:
Micropost.where(:belongs_to_id => #followers)
where #followers is an array of values for belongs_to_id.
It looks good to me.
However if you're looking for real minimum numer of characters in the code, you could change:
Micropost.where("belongs_to_id IN (?)", #followers)
to
Micropost.where("belongs_to_id = ?", #followers)
which reads a little easier.
Rails will see the array and do the IN.
As always the main goal of the ruby language is readability so little improvements help.
As for query being inefficent, you shuld check into indexs on that field.
They tend to be a little more specific for each db - you have only specified generic sql. in your question.

Need Pattern for dynamic search of multiple sql tables

I'm looking for a pattern for performing a dynamic search on multiple tables.
I have no control over the legacy (and poorly designed) database table structure.
Consider a scenario similar to a resume search where a user may want to perform a search against any of the data in the resume and get back a list of resumes that match their search criteria. Any field can be searched at anytime and in combination with one or more other fields.
The actual sql query gets created dynamically depending on which fields are searched. Most solutions I've found involve complicated if blocks, but I can't help but think there must be a more elegant solution since this must be a solved problem by now.
Yeah, so I've started down the path of dynamically building the sql in code. Seems godawful. If I really try to support the requested ability to query any combination of any field in any table this is going to be one MASSIVE set of if statements. shiver
I believe I read that COALESCE only works if your data does not contain NULLs. Is that correct? If so, no go, since I have NULL values all over the place.
As far as I understand (and I'm also someone who has written against a horrible legacy database), there is no such thing as dynamic WHERE clauses. It has NOT been solved.
Personally, I prefer to generate my dynamic searches in code. Makes testing convenient. Note, when you create your sql queries in code, don't concatenate in user input. Use your #variables!
The only alternative is to use the COALESCE operator. Let's say you have the following table:
Users
-----------
Name nvarchar(20)
Nickname nvarchar(10)
and you want to search optionally for name or nickname. The following query will do this:
SELECT Name, Nickname
FROM Users
WHERE
Name = COALESCE(#name, Name) AND
Nickname = COALESCE(#nick, Nickname)
If you don't want to search for something, just pass in a null. For example, passing in "brian" for #name and null for #nick results in the following query being evaluated:
SELECT Name, Nickname
FROM Users
WHERE
Name = 'brian' AND
Nickname = Nickname
The coalesce operator turns the null into an identity evaluation, which is always true and doesn't affect the where clause.
Search and normalization can be at odds with each other. So probably first thing would be to get some kind of "view" that shows all the fields that can be searched as a single row with a single key getting you the resume. then you can throw something like Lucene in front of that to give you a full text index of those rows, the way that works is, you ask it for "x" in this view and it returns to you the key. Its a great solution and come recommended by joel himself on the podcast within the first 2 months IIRC.
What you need is something like SphinxSearch (for MySQL) or Apache Lucene.
As you said in your example lets imagine a Resume that will composed of several fields:
List item
Name,
Adreess,
Education (this could be a table on its own) or
Work experience (this could grow to its own table where each row represents a previous job)
So searching for a word in all those fields with WHERE rapidly becomes a very long query with several JOINS.
Instead you could change your framework of reference and think of the Whole resume as what it is a Single Document and you just want to search said document.
This is where tools like Sphinx Search do. They create a FULL TEXT index of your 'document' and then you can query sphinx and it will give you back where in the Database that record was found.
Really good search results.
Don't worry about this tools not being part of your RDBMS it will save you a lot of headaches to use the appropriate model "Documents" vs the incorrect one "TABLES" for this application.