Changing Rails Query to Pure SQL - sql

I'm looking to move the following code away from ActiveRecord to pure SQL for a performance increase. What would be the best way to write this query in pure SQL (MySQL DB)?
User.count(:conditions => ["email = ?",params[:email]]) > 0
Thanks

Analogously to find_by_sql you can use count_by_sql:
User.count_by_sql(["SELECT COUNT(*) FROM users u WHERE u.email = ?", params[:email]]) > 0
Remember also to use the syntax ["... ? ...", var] here to protect against SQL injection.
However, I doubt that you can achieve a significant performance improvement by that. Test it. If it's not faster, stay with the ActiveRecord version or try to find a more nifty solution to your problem.
Edit:
If you just want to test whether an given email is already contained in the table you could also test the performance of User.find_by_email(params[:email]).present?

Related

How to write an SQL NOT EXISTS query/scope in the Rails way?

I have a database scope to filter only latest ProxyConfig version for particular Proxy and environment.
This is the raw SQL that works very well with MySQL, PostgreSQL and Oracle:
class ProxyConfig < ApplicationRecord
...
scope :current_versions, -> do
where %(NOT EXISTS (
SELECT 1 FROM proxy_configs pc
WHERE proxy_configs.environment = environment
AND proxy_configs.proxy_id = proxy_id
AND proxy_configs.version < version
))
end
...
end
You can find a simple test case in my baby_squeel issue.
But I find it nicer not to use SQL directly. I have spent a lot of time trying out different approaches to write it in the Rails way to no avail. I found generic Rails and baby_squeel examples but they always involved different tables.
PS The previous version used joins but it was super slow and it messed up some queries. For example #count produced an SQL syntax error. So I'm not very open on using other approaches. Rather I prefer to know how to implement this query exactly. Although I'm at least curious to see other simple solutions.
PPS About the question that direct SQL is fine. In this case, mostly yes. Maybe all RDBMS can understand this quoting. If one needs to compare text fields though that requires special functions on Oracle. On Postgres the case-insensitive LIKE is ILIKE. It can be handled automatically by Arel. In raw SQL it would require different string for the different RDBMS.
This isn't actually a query that you can build with the ActiveRecord Query Interface alone. It can be done with a light sprinkling of Arel though:
class ProxyConfig < ApplicationRecord
def self.current_versions
pc = arel_table.alias("pc")
where(
unscoped.select(1)
.where(pc[:environment].eq(arel_table[:environment]))
.where(pc[:proxy_id].eq(arel_table[:proxy_id]))
.where(pc[:version].gt(arel_table[:version]))
.from(pc)
.arel.exists.not
)
end
end
The generated SQL isn't identical but I think it should be functionally equivilent.
SELECT "proxy_configs".* FROM "proxy_configs"
WHERE NOT (
EXISTS (
SELECT 1 FROM "proxy_configs" "pc"
WHERE "pc"."environment" = "proxy_configs"."environment"
AND "pc"."proxy_id" = "proxy_configs"."proxy_id"
AND "pc"."version" > "proxy_configs"."version"
)
)

Pure SQL queries in Rails view?

I am asked to display some sort of data in my Rails App view with pure SQL query without help of ActiveRecord. This is done for the Application owner to be able to implement some third-party reporting tool (Pentaho or something).
I am bad in SQL and I am not really sure if that is even possible to do something similar. Does anyone have any suggestions?
If you must drop down to pure SQL, you can make life more pleasant by using 'find by sql':
bundy = MyModel.find_by_sql("SELECT my_models.id as id, my_articles.title as title from my_models, my_articles WHERE foo = 3 AND ... ...")
or similar. This will give you familiar objects which you can access using dot notation as you'd expect. The elements in the SELECT clause are available, as long as you use 'as' with compound parameters to give them a handle:
puts bundy.id
puts bundy.title
To find out what all the results are for a row, you can use 'attributes':
bundy.first.attributes
Also you can construct your queries using ActiveRecord, then call 'to_sql' to give you the raw SQL you're using.
sql = MyModel.joins(:my_article).where(id: [1,2,3,4,5]).to_sql
for example.
Then you could call:
MyModel.find_by_sql(sql)
Better, though, may be to just use ActiveRecord, then pass the result of 'to_sql' into whatever the reporting tool is that you need to use with it. Then your code maintains its portability and maintainability.

find_by_sql renders an array

I got some problems here, I can't make my find_by_sql request to render an ActiveRecord relation. Indeed, I need an activerecord relation to make a new request:
#searches = #searches.find_by_sql('SELECT *, COUNT( follower_id ) FROM follows GROUP BY followable_id LIMIT 0 , 3') if params[:only_famous_projects]
#project_pages = #project_pages.where(:project_id => #searches.pluck(:'followable.id')) if params[:only_famous_projects]
I can't use "pluck" without an activerecord relation. Therefore, I think I have to convert my sql request to an Activerecord request. However, as soon as I use "count" on ActiveRecord, I have an huge problem: I don't have on the end an ActiveRecord relation, but a FixNum!
I don't know where to find the answer anymore, I would be really gratefull if you could help me.
Thanks
find_by_sql will return an ActiveRecord object only if you call it with YourModel.find_by_sql.
Why not use the ActiveRecord query interface. It does a good job with calculations.
UPDATED
#searches = #searches.group(:followable_id).limit(3).offset(0).count(:follower_id) if params[:only_famous_projects]
Notice that it will give you a Hash containing the count for each followable_id.
Isn't LIMIT 0, 3 equivalent to LIMIT 3 ?
COUNT will always return a FixNUM, because you asked the database to count the number of rows.
You should really try to use find_by_sql as a last resort as it is only meant to bypass ActiveRecord for things that it can not do. And even for things that ActiveRecord doesn't support, you can always see if you can use the Squeel or Valium gems to handle edge-cases.
Another reason not to use find_by_sql is that, for example, using MySQL specific terms will lock you out of using other databases in the future.

Ruby followers query

We have an SQL query in our Rails 3 app.
#followers returns an array of IDs of users following the current_user.
#followers = current_user.following
#feed_items = Micropost.where("belongs_to_id IN (?)", #followers)
Is there a more efficient way to do this query?
The query you have can't really be optimized anymore than it is. It could be made faster by adding an index to belongs_to_id (which you should almost always do for foreign keys anyway), but that doesn't change the actual query.
There is a cleaner way to write IN queries though:
Micropost.where(:belongs_to_id => #followers)
where #followers is an array of values for belongs_to_id.
It looks good to me.
However if you're looking for real minimum numer of characters in the code, you could change:
Micropost.where("belongs_to_id IN (?)", #followers)
to
Micropost.where("belongs_to_id = ?", #followers)
which reads a little easier.
Rails will see the array and do the IN.
As always the main goal of the ruby language is readability so little improvements help.
As for query being inefficent, you shuld check into indexs on that field.
They tend to be a little more specific for each db - you have only specified generic sql. in your question.

How bad is my query?

Ok I need to build a query based on some user input to filter the results.
The query basically goes something like this:
SELECT * FROM my_table ORDER BY ordering_fld;
There are four text boxes in which users can choose to filter the data, meaning I'd have to dynamically build a "WHERE" clause into it for the first filter used and then "AND" clauses for each subsequent filter entered.
Because I'm too lazy to do this, I've just made every filter an "AND" clause and put a "WHERE 1" clause in the query by default.
So now I have:
SELECT * FROM my_table WHERE 1 {AND filters} ORDER BY ordering_fld;
So my question is, have I done something that will adversely affect the performance of my query or buggered anything else up in any way I should be remotely worried about?
MySQL will optimize your 1 away.
I just ran this query on my test database:
EXPLAIN EXTENDED
SELECT *
FROM t_source
WHERE 1 AND id < 100
and it gave me the following description:
select `test`.`t_source`.`id` AS `id`,`test`.`t_source`.`value` AS `value`,`test`.`t_source`.`val` AS `val`,`test`.`t_source`.`nid` AS `nid` from `test`.`t_source` where (`test`.`t_source`.`id` < 100)
As you can see, no 1 at all.
The documentation on WHERE clause optimization in MySQL mentions this:
Constant folding:
(a<b AND b=c) AND a=5
-> b>5 AND b=c AND a=5
Constant condition removal (needed because of constant folding):
(B>=5 AND B=5) OR (B=6 AND 5=5) OR (B=7 AND 5=6)
-> B=5 OR B=6
Note 5 = 5 and 5 = 6 parts in the example above.
You can EXPLAIN your query:
http://dev.mysql.com/doc/refman/5.0/en/explain.html
and see if it does anything differently, which I doubt. I would use 1=1, just so it is more clear.
You might want to add LIMIT 1000 or something, when no parameters are used and the table gets large, will you really want to return everything?
WHERE 1 is a constant, deterministic expression which will be "optimized out" by any decent DB engine.
If there is a good way in your chosen language to avoid building SQL yourself, use that instead. I like Python and Django, and the Django ORM makes it very easy to filter results based on user input.
If you are committed to building the SQL yourself, be sure to sanitize user inputs against SQL injection, and try to encapsulate SQL building in a separate module from your filter logic.
Also, query performance should not be your concern until it becomes a problem, which it probably won't until you have thousands or millions of rows. And when it does come time to optimize, adding a few indexes on columns used for WHERE and JOIN goes a long way.
TO improve performance, use column indexes on fields listen in "WHERE"
Standard SQL Injection Disclaimers here...
One thing you could do, to avoid SQL injection since you know it's only four parameters is use a stored procedure where you pass values for the fields or NULL. I am not sure of mySQL stored proc syntax, but the query would boil down to
SELECT *
FROM my_table
WHERE Field1 = ISNULL(#Field1, Field1)
AND Field2 = ISNULL(#Field2, Field2)
...
ORDRE BY ordering_fld
We've been doing something similiar not too long ago and there're a few things that we observed:
Setting up the indexes on the columns we were (possibly) filtering, improved performance
The WHERE 1 part can be left out completely if the filters're not used. (not sure if it applies to your case) Doesn't make a difference, but 'feels' right.
SQL injection shouldn't be forgotten
Also, if you 'only' have 4 filters, you could build up a stored procedure and pass in null values and check for them. (just like n8wrl suggested in the meantime)
That will work - some considerations:
About dynamically built SQL in general, some databases (Oracle at least) will cache execution plans for queries, so if you end up running the same query many times it won't have to completely start over from scratch. If you use dynamically built SQL, you are creating a different query each time so to the database it will look like 100 different queries instead of 100 runs of the same query.
You'd probably just need to measure the performance to find out if it works well enough for you.
Do you need all the columns? Explicitly specifying them is probably better than using * anyways because:
You can visually see what columns are being returned
If you add or remove columns to the table later, they won't change your interface
Not bad, i didn't know this snippet to get rid of the 'is it the first filter 3' question.
Tho you should be ashamed of your code ( ^^ ), it doesn't do anything to performance as any DB Engine will optimize it.
The only reason I've used WHERE 1 = 1 is for dynamic SQL; it's a hack to make appending WHERE clauses easier by using AND .... It is not something I would include in my SQL otherwise - it does nothing to affect the query overall because it always evaluates as being true and does not hit the table(s) involved so there aren't any index lookups or table scans based on it.
I can't speak to how MySQL handles optional criteria, but I know that using the following:
WHERE (#param IS NULL OR t.column = #param)
...is the typical way of handling optional parameters. COALESCE and ISNULL are not ideal because the query is still utilizing indexes (or worse, table scans) based on a sentinel value. The example I provided won't hit the table unless a value has been provided.
That said, my experience with Oracle (9i, 10g) has shown that it doesn't handle [ WHERE (#param IS NULL OR t.column = #param) ] very well. I saw a huge performance gain by converting the SQL to be dynamic, and used CONTEXT variables to determine what to add. My impression of SQL Server 2005 is that these are handled better.
I have usually done something like this:
for(int i=0; i<numConditions; i++) {
sql += (i == 0 ? "WHERE " : "AND ");
sql += dbFieldNames[i] + " = " + safeVariableValues[i];
}
Makes the generated query a little cleaner.
One alternative i sometimes use is to build the where clause an an array and then join them together:
my #wherefields;
foreach $c (#conditionfields) {
push #wherefields, "$c = ?",
}
my $sql = "select * from table";
if(#wherefields) { $sql.=" WHERE " . join (" AND ", #wherefields); }
The above is written in perl, but most languages have some kind of join funciton.