Pull back rows from multiple tables with a sub-select? - sql

I have a script which generates queries in the following fashion (based on user input):
SELECT * FROM articles
WHERE (articles.skeywords_auto ilike '%pm2%')
AND spubid IN (
SELECT people.spubid FROM people
WHERE (people.slast ilike 'chow')
GROUP BY people.spubid)
LIMIT 1;
The resulting data set:
Array ( [0] =>
Array (
[spubid] => A00603
[bactive] => t
[bbatch_import] => t
[bincomplete] => t
[scitation_vis] => I,X
[dentered] => 2009-07-24 17:07:27.241975
[sentered_by] => pubs_batchadd.php
[drev] => 2009-07-24 17:07:27.241975
[srev_by] => pubs_batchadd.php
[bpeer_reviewed] => t
[sarticle] => Errata: PM2.5 and PM10 concentrations from the Qalabotjha low-smoke fuels macro-scale experiment in South Africa (vol 69, pg 1, 2001)
[spublication] => Environmental Monitoring and Assessment
[ipublisher] =>
[svolume] => 71
[sissue] =>
[spage_start] => 207
[spage_end] => 210
[bon_cover] => f
[scover_location] =>
[scover_vis] => I,X
[sabstract] =>
[sabstract_vis] => I,X
[sarticle_url] =>
[sdoi] =>
[sfile_location] =>
[sfile_name] =>
[sfile_vis] => I
[sscience_codes] =>
[skeywords_manual] =>
[skeywords_auto] => 1,5,69,2001,africa,assessment,concentrations,environmental,errata,experiment,fuels,low-smoke,macro-scale,monitoring,pg,pm10,pm2,qalabotjha,south,vol
[saward_number] =>
[snotes] =>
)
The problem is that I also need all the columns from the 'people' table (as referenced in the sub select) to come back as part of the data set. I haven't (obviously) done much with sub selects in the past so this approach is very new to me. How do I pull back all the matching rows/columns from the articles table AS WELL as the rows/column from the people table?

Are you familiar with joins? Using ANSI syntax:
SELECT DISTINCT *
FROM ARTICLES t
JOIN PEOPLE p ON p.spubid = t.spudid AND p.slast ILIKE 'chow'
WHERE t.skeywords_auto ILIKE'%pm2%'
LIMIT 1;
The DISTINCT saves from having to define a GROUP BY for every column returned from both tables. I included it because you had the GROUP BY on your subquery; I don't know if it was actually necessary.

Could you not use a join instead of a sub-select in this case?
SELECT a.*, p.*
FROM articles as a
INNER JOIN people as p ON a.spubid = p.spubid
WHERE a.skeywords_auto ilike '%pm2%'
AND p.slast ilike 'chow'
LIMIT 1;

Lets start from the beginning
You shouldn't need a group by. Use distinct instead (you aren't doing any aggregating in the inner query).
To see the contents of the inner table, you actually have to join it. The contents are not exposed unless it shows up in the from section. A left outer join from the people table to the articles table should be equivalent to an IN query :
SELECT *
FROM people
LEFT OUTER JOIN articles ON articles.spubid = people.spubid
WHERE people.slast ilike 'chow'
AND articles.skeywords_auto ilike '%pm2%'
LIMIT 1

Related

Is it possible to use NH QueryOver to fetch joined entities in one query?

SQL query is:
select B.* from A inner join B on A.b_id = B.id where A.x in (1,2,3)
A <-> B relation is many-to-one
I need to filter by A but fetch related B.
UPDATE:
I tried this NH QueryOver
Session.QueryOver<A>.Where(a => a.x.IsIn(array)).JoinQueryOver(a => a.B).Select(a => a.B).List<B>()
but it results in a N+1 sequence of queries: the first one fetches IDs of related Bs, and others fetch related Bs one by one by ID (analyzed via NHProf). I want it to fetch a list of Bs in one go.
UPDATE 2:
for now I worked around this with subquery
Session.QueryOver(() => b).WithSubquery.WhereExists(QueryOver.Of<A>().Where(a => a.x.IsIn(array)).And(a => a.b_id == b.id).Select(a => a.id)).List<B>()
but I still hope to see an example of QueryOver without subquery as I tend to think subquery is less efficient.
This works (at least in my test application):
var list = session.QueryOver<A>()
.Where(a => a.X.IsIn(array))
.Fetch(x => x.B).Eager
.List<A>()
.Select(x => x.B);
Note that the .Select() statement is a normal LINQ statement, not part of NHibernate.
Generated SQL:
SELECT
this_.Id as Id0_1_,
this_.B as B0_1_,
this_.X as X0_1_,
b2_.Id as Id1_0_,
b2_.SomeValue as SomeValue1_0_
FROM A this_ left outer join B b2_ on this_.B=b2_.Id
WHERE this_.X in (?, ?, ?)
It's not optimal if A is a very large class, of course.
An NHibernate-only solution with a subquery:
var candidates = QueryOver.Of<A>()
.Where(a => a.X.IsIn(array))
.Select(x => x.B.Id);
var list = session.QueryOver<B>()
.WithSubquery.WhereProperty(x => x.Id).In(candidates).List();
I'll try to find the reason why the most obvious solution (just adding Fetch().Eager) doesn't work as expected. Stay tuned!

Sequel -- can I alias subqueries in a join?

Using Sequel I'd like to join two subqueries together that share some column names, and then table-qualify those columns in the select.
I understand how to do this if the two datasets are just tables. E.g. if I have a users table and an items table, with items belonging to users, and I want to list the items' names and their owners' names:
#db[:items].join(:users, :id => :user_id).
select{[items__name, users__name.as(user_name)]}
produces
SELECT "items"."name", "users"."name" AS "user_name"
FROM "items"
INNER JOIN "users" ON ("users"."id" = "items"."user_id")
as desired.
However, I'm unsure how to do this if I'm joining two arbitrary datasets representing subqueries (call them my_items and my_users)
The syntax would presumably take the form
my_items.join(my_users, :id => :user_id).
select{[ ... , ... ]}
where I would supply qualified column names to access my_users.name and my_items.name. What's the appropriate syntax to do this?
A partial solution is to use t1__name for the first argument, as it seems that the dataset supplied to a join is aliased with t1, t2, etc. But that doesn't help me qualify the item name, which I need to supply to the second argument.
I think the most desirable solution would enable me to provide aliases for the datasets in a join, e.g. like the following (though of course this doesn't work for a number of reasons)
my_items.as(alias1).join(my_users.as(alias2), :id => :user_id).
select{[alias1__name, alias2__name ]}
Is there any way to do this?
Thanks!
Update
I think from_self gets me part of the way there, e.g.
my_items.from_self(:alias => :alias1).join(my_users, :id => :user_id).
select{[alias1__name, t1__name]}
seems to do the right thing.
OK, thanks to Ronald Holshausen's hint, got it. The key is to use .from_self on the first dataset, and provide the :table_alias option in the join:
my_items.from_self(:alias => :alias1).
join(my_users, {:id => :user_id}, :table_alias => :alias2).
select(:alias1__name, :alias2__name)
yields the SQL
SELECT "alias1"."name", "alias2"."name"
FROM ( <my_items dataset> ) AS "alias1"
INNER JOIN ( <my_users dataset> ) AS "alias2"
ON ("alias2"."id" = "alias1"."user_id")
Note that the join hash (the second argument of join) needs explicit curly braces to distinguish it from the option hash that includes :table_alias.
The only way I found was to use the from method on the DB, and the :table_alias on the join method, but these don't work with models so I had to use the table_name from the model class. I.e.,
1.9.3p125 :018 > #db.from(Dw::Models::Contract.table_name => 'C1')
=> #<Sequel::SQLite::Dataset: "SELECT * FROM `vDimContract` AS 'C1'">
1.9.3p125 :019 > #db.from(Dw::Models::Contract.table_name => 'C1').join(Dw::Models::Contract.table_name, {:c1__id => :c2__id}, :table_alias => 'C2')
=> #<Sequel::SQLite::Dataset: "SELECT * FROM `vDimContract` AS 'C1' INNER JOIN `vDimContract` AS 'C2' ON (`C1`.`Id` = `C2`.`Id`)">
1.9.3p125 :020 > #db.from(Dw::Models::Contract.table_name => 'C1').join(Dw::Models::Product.table_name, {:product_id => :c1__product_id}, :table_alias => 'P1')
=> #<Sequel::SQLite::Dataset: "SELECT * FROM `vDimContract` AS 'C1' INNER JOIN `vDimProduct` AS 'P1' ON (`P1`.`ProductId` = `C1`.`ProductId`)">
The only thing I don't like about from_self is it uses a subquery:
1.9.3p125 :021 > Dw::Models::Contract.from_self(:alias => 'C1')
=> #<Sequel::SQLite::Dataset: "SELECT * FROM (SELECT * FROM `vDimContract`) AS 'C1'">

SQL problems when migrating from MySQL to PostgreSQL

I have a Ruby on Rails 2.3.x application that I'm trying to migrate from my own VPS to Heroku, including porting from SQLite (development) and MySQL (production) to Postgres.
This is a typical Rails call I'm using:
spots = Spot.paginate(:all, :include => [:thing, :user, :store, {:thing => :tags}, {:thing => :brand}], :group => :thing_id, :order => order, :conditions => conditions, :page => page, :per_page => per_page)
Question 1: I get a lot of errors like PG::Error: ERROR: column "spots.id" must appear in the GROUP BY clause or be used in an aggregate function. SQLite/MySQL was evidently more forgiving here. Of course I can easily fix these by adding the specified fields to my :group parameter, but I feel I'm messing up my code. Is there a better way?
Question 2: If I throw in all the GROUP BY columns that Postgres is missing I end up with the following statement (only :group has changed):
spots = Spot.paginate(:all, :include => [:thing, :user, :store, {:thing => :tags}, {:thing => :brand}], :group => 'thing_id,things.id,users.id,spots.id', :order => order, :conditions => conditions, :page => page, :per_page => per_page)
This in turn produces the following SQL code:
SELECT * FROM (SELECT DISTINCT ON ("spots".id) "spots".id, spots.created_at AS alias_0 FROM "spots"
LEFT OUTER JOIN "things" ON "things".id = "spots".thing_id
WHERE (spots.recommended_to_user_id = 1 OR spots.user_id IN (1) OR things.is_featured = 't')
GROUP BY thing_id,things.id,users.id,spots.id) AS id_list
ORDER BY id_list.alias_0 DESC LIMIT 16 OFFSET 0;
...which produces the error PG::Error: ERROR: missing FROM-clause entry for table "users". How can I solve this?
Question 1:
...Is there a better way?
Yes. Since PostgreSQL 9.1 the primary key of a table logically covers all columns of a table in the GROUP BY clause. I quote the release notes for version 9.1:
Allow non-GROUP BY columns in the query target list when the primary
key is specified in the GROUP BY clause (Peter Eisentraut)
Question 2:
The following statement ... produces the error
PG::Error: ERROR: missing FROM-clause entry for table "users"
How can I solve this?
First (as always!), I formatted your query to make it easier to understand. The culprit has bold emphasis:
SELECT *
FROM (
SELECT DISTINCT ON (spots.id)
spots.id, spots.created_at AS alias_0
FROM spots
LEFT JOIN things ON things.id = spots.thing_id
WHERE (spots.recommended_to_user_id = 1 OR
spots.user_id IN (1) OR
things.is_featured = 't')
GROUP BY thing_id, things.id, users.id, spots.id
) id_list
ORDER BY id_list.alias_0 DESC
LIMIT 16
OFFSET 0;
It's all obvious now, right?
Well, not all of it. There is a lot more. DISTINCT ON and GROUP BY in the same query for one, which has its uses, but not here. Radically simplify to:
SELECT s.id, s.created_at AS alias_0
FROM spots s
WHERE s.recommended_to_user_id = 1 OR
s.user_id = 1 OR
EXISTS (
SELECT 1 FROM things t
WHERE t.id = s.thing_id
AND t.is_featured = 't')
ORDER BY s.created_at DESC
LIMIT 16;
The EXISTS semi-join avoids the later need to GROUP BY a priori. This should be much faster (besides being correct) - if my assumptions about the missing table definitions hold.
Going the "pure SQL" route opened up a can of worms for me, so I tried keeping the will_paginate gem and tweak the Spot.paginate parameters instead. The :joins parameter turned out to be very helpful.
This is currently working for me:
spots = Spot.paginate(:all, :include => [:thing, {:thing => :tags}, {:thing => :brand}], :joins => [:user, :store, :thing], :group => 'thing_id,things.id,users.id,spots.id', :order => order, :conditions => conditions, :page => page, :per_page => per_page)

Complex rails find ordering

I am trying to do a find which orders results by their house name and then by the customer's last name.
Customer.find(:all,
:conditions =>['customers.id IN (?)', intersection],
:joins => 'JOIN histories ON histories.customer_id = customers.id
JOIN houses ON histories.house_id = houses.id',
:order => "houses.name ASC, customers.last_name ASC",
:select => "customers.*, histories.job_title, houses.name"
)
My problem is this will return every history related to each customer.
if I add AND histories.finish_date IS NULL
This will prevent every history for the selected customer being returned but it will also stop customers in the intersection who have no history or a finish_date set from being returned.
Basically I need every customer in the intersection returned once with there current house name(if they have one) and then ordered by their house name and then their last name.
So is there a way of doing this?
Here is an example
customer
id last_name
1 franks
2 doors
3 greens
histories
id finish_date house_id customer_id
1 NULL 1 1
2 NULL 2 2
3 11/03/10 2 1
4 22/04/09 1 2
NULL = current house
houses
id name
1 a
2 b
Results
intersection = 1,2,3
last_name house
franks a
doors b
greens NULL
Thanks
I think you need to use outer joins.
For example, this should work:
Customer.find(:all,
:conditions =>['customers.id IN (?) and histories.finish_date is null', intersection],
:joins => 'LEFT OUTER JOIN histories ON histories.customer_id = customers.id
LEFT OUTER JOIN houses ON histories.house_id = houses.id',
:order => "houses.name ASC, customers.last_name ASC",
:select => "customers.*, histories.job_title, houses.name"
)
If you've got an association between Customer and History and between History and House you should be able to do :include => [:histories => :house] instead of the :joins option.
The only other thing is that the customers with no house will appear first in the list due to NULL being earlier in the order than a non-NULL value. You might want to try an order option like this :
:order => 'isnull(houses.name), houses.name, customers.last_name'
to achieve what you specified.
IMO it's simpler to do the sorting logic in Rails instead of the database:
customers = Customer.find(:all, :conditions => { :id => intersection }, :include => [ { :histories => :houses } ])
customers.sort_by { |c| c.last_name }
customers.sort_by do |c|
current_house = c.histories.find_by_finish_date(nil) # Returns nil if no matching record found
if current_house
current_house.name
else
''
end
end
Explanations
:conditions can take an hash { :column_name => array } which translates into your IN where-condition
:include pre-loads (eager loading) the tables if the corresponding associations exist. To put it another way: :joins creates INNER JOINs, while :include creates LEFT JOINs. Here we will left join histories and again left join houses. You could omit this :include tag, in which case rails does a new query each time you access a histories or houses property.
sort_by allows to define a custom sort criteria.
find_by_finish_date is one of rails' magic methods; it is equivalent to h.find(:conditions => {:finish_date => nil })
How to output: Just output all of them in your view. If he does not have histories, customer.histories is an empty array.

Complex Join Queries in Rails

I have 3 tables - venues, users, and updates (which have a integer for rating) - and I want to write a query that will return a list of all my venues as well as their average ratings using only the most recent update for each person, venue pair. For example, if user 1 rates venue A once at 9 am with a 4, and then rates it again at 5 pm with a 3, I only want to use the rating of 3, since it's more recent. There are also some optional conditions, such as how recent the updates must be, and if there is an array of user ids the users must be within.
Does anybody have a suggestion on what the best way to write something like this is so that it is clean and efficient? I have written the following named_scope which should do the trick, but it is pretty ugly:
named_scope :with_avg_ratings, lambda { |*params|
hash = params.first || {}
has_params = hash[:user_ids] || hash[:time_ago]
dir = hash[:dir] || 'DESC'
{
:joins => %Q{
LEFT JOIN (select user_id, venue_id, max(updated_at) as last_updated_at from updates
WHERE type = 'Review' GROUP BY user_id, venue_id) lu ON lu.venue_id = venues.id
LEFT JOIN updates ON lu.last_updated_at = updates.updated_at
AND updates.venue_id = venues.id AND updates.user_id = lu.user_id
},
:select => "venues.*, ifnull(avg(rating),0) as avg_rating",
:group => "venues.id",
:order => "avg_rating #{dir}",
:conditions => Condition.block { |c|
c.or { |a|
a.and "updates.user_id", hash[:user_ids] if hash[:user_ids]
a.and "updates.updated_at", '>', hash[:time_ago] if hash[:time_ago]
} if has_params
c.or "updates.id", 'is', nil if has_params
}
}
}
I include the last "updates.id is null" condition because I still want the venues returned even if they don't have any updates associated with them.
Thanks,
Eric
Yikes, that looks like a job for find_by_sql to me. When you're doing something that complex, I find it's best to take the job away from ActiveRecord and DIY.