Nested queries using Arel (Rails3) - sql

For example, I have 2 models:
Purchase (belongs_to :users)
User (has_many :purchases)
I want to select all users that have at least one purchase.
In SQL I would write like this:
SELECT * FROM `users` WHERE `id` IN (SELECT DISTINCT `buyer_id` FROM `purchases`)
And one more question: are there any full documentation or book that cover Arel?

Hmm, I'd like to answer my question... :)
buyers=purchases.project(:buyer_id).group(purchases[:buyer_id]) #<-- all buyers
busers=users.where(users[:id].in(buyers)) #<--answer

The Rails Guide has really good documentation for ARel.
http://guides.rubyonrails.org/active_record_querying.html#conditions
The Rails API is also pretty useful for some of the more obscure options. I just google a specific term with "rails api" and it comes up first.

I don't believe that the code above issues a nested query. Instead, it appears that it would issue 2 separate SQL queries. You may have comparable speed (depending on how concerned you are with performance), but with 2 round trips to the server, it doesn't offer the same benefits of nested queries.

Related

Rails ActiveRecord Query: Too slow unless I split it into 2 queries. Why?

My Post model has_many :comments. And it has a search method with this query:
joins{comments.outer}.where{ (title.like_any search_words) | (body.like_any search_words) | (comments.body.like_any search_words) }
That gives the desired records: "Find all Posts where the title, body, or any of its comments' body (if it has comments) matches any of the search_words".
The problems is: this query takes 8 seconds! I can make it run in mere milliseconds if I split it into 2 queries:
( where{ (title.like_any search_words) | (body.like_any search_words) }
+ joins{:comments}.where{ comments.body.like_any search_words } ).uniq
That gives the same results, only much faster. But it forces the ActiveRecord::Relation into an array, so I can't do stuff like .includes and .pagination afterwards. I really want to keep the result as an ActiveRecord::Relation, but I want it to be faster. What would you do?
PS: I'm using Squeel here.
Thanks.
if you are using rails 4 then query chaining is possible. else what you can do is select the required field as you need to show. and also do the proper indexing for search fields but make sure before and after adding the index please check you performance.
or another option is write a required sql query. and use it by find_by_sql method in rails.
The two seperate queries, imho, will be significantly faster because they are each much simpler and much easier for the database to optimise.
So ideally what one would want is the union of those two queries. Unfortunately, afaik, this feature does not exist in ActiveRecord/Arel/Squeel. So you would have to write your own query to solve that.
Of course: I am not sure that if you do the two queries, union, and then distinct, you will not get the same execution speed as the first query.
I generally use postgresql (and would recommend that), but I noticed that MySQL also support FULLTEXT indexes, so you might want to look into that.

Search in huge table

I got table with over 1 millions rows.
This table represents user information, e.g userName, email, gender, marrial status etc.
I'm going to write search over all rows in this table, when some conditions are applied.
In simples case, when search is perfomed only on userName, it takes over 4-7 seconds to find result.
select from u where u.name ilike " ... "
Yes, i got indexes over some fileds. I checked that they are applied using explain analyse command.
How search can be boost ?
I heart something about Lucene, can it help ?
I'm wondering how does Facebook search working, they got billions users and their search works much faster.
There is great difference between these three queries:
a) SELECT * FROM u WHERE u.name LIKE "George%"
b) SELECT * FROM u WHERE u.name LIKE "%George"
c) SELECT * FROM u WHERE u.name LIKE "%George%"
a) The first will use the index on u.name (if there is one) and will be very fast.
b) The second will not be able to use any index on u.name but there are ways to circumvent that rather easily.
For example, you could add another field nameReversed in the table where REVERSE(name) is stored. With an index on that field, the query will be rewritten as (and will be as fast as the first one):
b2) SELECT * FROM u WHERE u.nameReversed LIKE REVERSE("%George")
c) The third query poses the greatest difficulty as neither of the two previous indexes will be of any help and the query will scan the whole table. Alternatives are:
Using a dedicated for such problems solution (search for "full text search"), like Sphinx. See this question on SO with more details: which-is-best-search-technique-to-search-records
If your field has names only (or another limited set of words, say a few hundred different words), you could create another auxilary table with those names (words) and store only a foreign key in table u.
If off course that is not the case and you have tens of thousands or millions different words or the field contains whole phrases, then to solve the problem with many auxilary tables, it's like creating a full text search tool for yourself. It's a nice exercise and you won't have to use Sphinx (or other) besides the RDBMS but it's not trivial.
Take a look at
Hibernate Search
this is using Lucene but a lot more easier to implement.
Google or Facebook are using different approaches. They have distributed systems. Googles BigTable is a good keyword or the "Map and Reduce" concept (Apache Hadoop) is a good starting point for more research.
Try to use table partitioning.
In large table scenarios can be helpful to partiton a table.
For PostgreSQL try here PostgreSQL Partitioning.
For high scalable fast performance searches, sometimes may be useful to adopt NoSQL database (like Facebook does).
I heart something about Lucene, can it help ?
Yes, it can. I'm sure, you will love it!
I had the same problem: An table with round about 1.2 Million Messages. By searching trough these Messages it needs some seconds. An full text search on the "message" column needs about 10 seconds.
At the same server hardware lucene returns the result in about 200-400ms.
That's very fast.
Cached results returns in round about 5-10 ms.
Lucene is able to connect to your SQL database (for example mysql) - scans your database an builds an searchable index.
For searching this index it depends on the kind of application.
I my case, my PHP Webaplication uses solr for searching inside lucene.
http://lucene.apache.org/solr/

Is creating a view of multiple joining table faster then gettin that data directly using the join in a query?

Following is the query which is hit every table has over 100,000 records.
SELECT b.login as userEmail, imgateway_instance_id as img, u.id as userId
FROM buddy b
INNER JOIN `user` u ON b.username = u.login
INNER JOIN bot_to_buddy btb ON b.id = btb.buddy_id
INNER JOIN bot ON btb.bot_id = bot.id
WHERE u.id IN 14242
Using joins with tables that have as large an amount of records as yours are often very slow. This is so because joins will go over every record in a table which makes the query take a lot of time.
As a personally experienced solution I would suggest that you try and cut down the results of your query by using WHERE as much as you can to filter down the results and then use joins.
No, you cannot gain performance from using a view. Behind the scene, your original query is run when you query the view.
Sometimes using views can gain a small bit of performance, like it says in High Performance MySQL
On the other hand the author of the book has written this blog:
Views as performance trouble maker
Generally speaking, this depends on how you submit your query.
The view MAY be faster:
For example, in PHP it's common practice to submit the query "dynamically" (i.e. NOT as an prepared statement).
That means MySQL has to compile the query every time you call it. When using a view, this in done once when the view is created.
Regarding MySQL as an DBMS, I heard about performance issues with Views in earlier versions. (Don't know what the current situation is, though).
As a general rule in such questions, just benchmark your query to get real life results. Looks like you have already populated your database with a lot of data, so this should yield meaningful results. (Don't forget to disable caching in MySQL).
There's little reason having a view run your query instead of running the query yourself be any faster with MySQL.
Views in MySQL is generally so poorly implemented, we had to back out
of using them for many of our projects.
Check with EXPLAIN what your query does when you place it in a view, looking at that query, it can probably still use the proper indexes even it's part of a view, so it'll atleast not be slower.

long oracle query

I've got really long and complicated query(Oracle 10g). It contains about ten select subqueries. The query works but it's too long. Should I somehow divide this query? I mean is there some standard how long/complicated could sql query be. The query works but it doesn't seem to me like the best solution.
For example one subquery repeats there (it queries the table smaller then 20 rows), how could I make it to run it just once during this query?
Maybe it's too general question
Thanks for all answers
Tonu
From version 9 onwards, you can factor your SQL code almost like any other code, using a feature called subquery factoring, also known as the with-clause.
The documentation: http://download.oracle.com/docs/cd/B10501_01/server.920/a96540/statements_103a.htm#2075668
An example: http://download.oracle.com/docs/cd/B10501_01/server.920/a96540/statements_103a.htm#2075888
Regards,
Rob.
try looking into the with clause, it does do a subquery once, and then lets you reference the resuling rows over an dover again
I can only suggest to use EXPLAIN PLAN a lot to figure out what the query optimizer is doing to reorganize the query.
An alternative approach may be to talk to the business and figure out what they truly want and look in the system if there is no information available which is closer to the problem domain.
I once had a situation like that regarding "On time deliveries" where the definition of "On Time Delivery" was butchered beyind recognition by the business middle management, eager to present a "good news show" and was bloated to the extreme because of special case handling.
Pushing back, going to the Management Handbook, implementing the definition which was there, and using a handy aggregates table create by Oracle EBS, reduced the runtime from 25mins to 2 secs.

Has anyone written a higher level query langage (than sql) that generates sql for common tasks, on limited schemas

Sql is the standard in query languages, however it is sometime a bit verbose. I am currently writing limited query language that will make my common queries quicker to write and with a bit less mental overhead.
If you write a query over a good database schema, essentially you will be always joining over the primary key, foreign key fields so I think it should be unnecessary to have to state them each time.
So a query could look like.
select s.name, region.description from shop s
where monthly_sales.amount > 4000 and s.staff < 10
The relations would be
shop -- many to one -- region,
shop -- one to many -- monthly_sales
The sql that would be eqivilent to would be
select distinct s.name, r.description
from shop s
join region r on shop.region_id = region.region_id
join monthly_sales ms on ms.shop_id = s.shop_id
where ms.sales.amount > 4000 and s.staff < 10
(the distinct is there as you are joining to a one to many table (monthly_sales) and you are not selecting off fields from that table)
I understand that original query above may be ambiguous for certain schemas i.e if there the two relationship routes between two of the tables. However there are ways around (most) of these especially if you limit the schema allowed. Most possible schema's are not worth considering anyway.
I was just wondering if there any attempts to do something like this?
(I have seen most orm solutions to making some queries easier)
EDIT: I actually really like sql. I have used orm solutions and looked at linq. The best I have seen so far is SQLalchemy (for python). However, as far as I have seen they do not offer what I am after.
Hibernate and LinqToSQL do exactly what you want
I think you'd be better off spending your time just writing more SQL and becoming more comfortable with it. Most developers I know have gone through just this progression, where their initial exposure to SQL inspires them to bypass it entirely by writing their own ORM or set of helper classes that auto-generates the SQL for them. Usually they continue adding to it and refining it until it's just as complex (if not more so) than SQL. The results are sometimes fairly comical - I inherited one application that had classes named "And.cs" and "Or.cs", whose main functions were to add the words " AND " and " OR ", respectively, to a string.
SQL is designed to handle a wide variety of complexity. If your application's data design is simple, then the SQL to manipulate that data will be simple as well. It doesn't make much sense to use a different sort of query language for simple things, and then use SQL for the complex things, when SQL can handle both kinds of thing well.
I believe that any (decent) ORM would be of help here..
Entity SQL is slightly higher level (in places) than Transact SQL. Other than that, HQL, etc. For object-model approaches, LINQ (IQueryable<T>) is much higher level, allowing simple navigation:
var qry = from cust in db.Customers
select cust.Orders.Sum(o => o.OrderValue);
etc
Martin Fowler plumbed a whole load of energy into this and produced the Active Record pattern. I think this is what you're looking for?
Not sure if this falls in what you are looking for but I've been generating SQL dynamically from the definition of the Data Access Objects; the idea is to reflect on the class and by default assume that its name is the table name and all properties are columns. I also have search criteria objects to build the where part. The DAOs may contain lists of other DAO classes and that directs the joins.
Since you asked for something to take care of most of the repetitive SQL, this approach does it. And when it doesn't, I just fall back on handwritten SQL or stored procedures.