Is there a way to create an arel_table from a query? - sql

I have two tables (A and B) and a relatively complicated Active::Record::Relation that selects from a join of these two tables. The query executes correctly with ActiveRecord::Base.connection.exec_query joined.to_sql, that is, it prints out the columns I want from each table (A.id, A.title, b.num).
I would like to then pass this "joined" table as an Arel::Table, to be used in the rest of the program. However, when I run at_j=joined.arel_table, the Arel table is created from the original database A, not from the one resulting from the "joined" query, i.e. I get all the columns from A (not only the selected ones), and none of the columns from B.
I realise that a first step would be to create an arel table from an already filtered table, i.e. if A has columns id, title, c1, c2, c3... I would like to be able to do:
filtered=A.select(:id,:title)
at_f=filtered.arel_table
and only get id and title in at_f, but that is not what happens, I also get c1, c2, c3....
I know I could do
at_f=A.arel_table.project(:id,:title)
but this outputs an Arel::SelectManager, and I need to pass an Arel::Table (that is out of my hands).
I also would rather not build the query in Arel, because I need to modify the table A that was given as an input, and I can do that using _selct! and joins!.
Is there a way to achieve this? I thought of using something like
at_f=Arel::Table.new(filtered.to_sql)
but that fails, unsurprisingly...
Thanks in advance for your help.
................................
In case this is useful, this is how I get the "joined" active record relation:
A._select!(:id,:title,'b.num')
bf=B.where(c1: 'x',c2: 'y')
num=bf.select('id_2 AS A_id, COUNT(id_2) AS num').group(:id_2)
A.joins!("LEFT OUTER JOIN (#{num.to_sql}) b ON A.id = b.A_id")
and this is the query it generates:
# A.to_sql:
SELECT `A`.`id`, `A`.`title`, `b`.`num`
FROM `A` LEFT OUTER JOIN
(SELECT id_2 AS A_id, COUNT(id_2) AS num
FROM `B` WHERE `B`.`c1` = 'x' AND `B`.`c2` = 'y'
GROUP BY `B`.`id_2`) b
ON A.id = b.A_id

Maybe I understand what you are trying for although I am not sure about the whole Arel::Table part but we can get you that AR Relation from A as follows:
b_table = B.arel_table
A.joins(
Arel::Nodes::OuterJoin.new(b, b.create_on(
b[:id_2].eq(A.arel_table[:id])
.and(b[:c1].eq('x'))
.and(b[:c2].eq('y'))
)))
.select(:id, :title, b[:id_2].count.as('num'))
.group(:id,:title)
This will result in an ActiveRecord::Relation object and when executed will return A objects with only the following attributes: id, title, and num.
The SQL will be:
SELECT
a.id,
a.title,
COUNT(b.id_2) AS num
FROM
a
LEFT OUTER JOIN b ON b.id_2 = a.id
AND b.c1 = 'x'
AND b.c2 = 'y'
GROUP BY
a.id,
a.title
Which is equivalent to what you have now.
If you truly want to build the query you have now we can certainly do that without issue but this is a bit cleaner.
If this is not your intended outcome please clarify and I will update accordingly.
Notes:
I would like to make it clear that at no time will you be able to "...pass an Arel::Table..." that also contains query syntactics as that is not what a Arel::Table is.
We can produce an Arel::Nodes::TableAlias which kind of duck types an Arel::Table for most intents and purposes and will allow for a query (subquery).
It may be helpful to you to know that you can convert an AR query to Arel by simply using the arel method. For Example:
arel_query = A.select(:id,:title).arel
#=> #<Arel::SelectManager:0x00007fffd541dd90 ...>
arel_query.to_sql
#=> "SELECT a.id, a.title FROM a"
UPDATE with Additional Information:
You can create an Arel::Table out of nothing t =Arel::Table.new('c')
You can use this as a point of reference for table and column constructs e.g. t[:id] will return an Attribute and will generate SQL of c.id
Arel::Table#project - is the SELECT command and returns SelectManager (this object is the primary means of interaction with the AST and the table). SelectManager also has project to add to the current projections.
You can use a Table or a TableAlias as a source for the SelectManager using #from
c = Arel::Nodes::TableAlias.new([our query from above], t.name)
sm = Arel::SelectManager.new(c)
# alternately sm = Arel::SelectManager.new; sm.from(c)
sm.project(c[:id])
#=> "SELECT [c].[id] FROM (SELECT a.id, a.title, COUNT(b.id_2) AS num FROM a LEFT OUTER JOIN b ON b.id_2 = a.id AND b.c1 = 'x' AND b.c2 = 'y' GROUP BY a.id, a.title) [c]
You can use the t table we created above or the b table alias to add columns, sort, etc.
sm.project(b[:title],t[:num]).order(t[:num])
sm.to_sql
#=> SELECT [c].[id], [c].[title], [c].[num] FROM (SELECT a.id, a.title, COUNT(b.id_2) AS num FROM a LEFT OUTER JOIN b ON b.id_2 = a.id AND b.c1 = 'x' AND b.c2 = 'y' GROUP BY a.id, a.title) [c] ORDER BY [c].[num]
You can access the "froms" in the SelectManager using the froms method which will give you access to the TableAlias in this case which may be what you are looking for regarding the need for an Arel::Table
sm.froms[0]
#=> #<Arel::Nodes::TableAlias:0x00007fffbddc7160 ...>

Related

Select records where every record in one-to-many join matches a condition

How can I write a SQL query that returns records from table A only if every associated record from table B matches a condition?
I'm working in Ruby, and I can encode this logic for a simple collection like so:
array_of_A.select { |a| a.associated_bs.all? { |b| b.matches_condition? } }
I am being generic in the construction, because I'm working on a general tool that will be used across a number of distinct situations.
What I know to be the case is that INNER JOIN is the equivalent of
array_of_A.select { |a| a.associated_bs.any? { |b| b.matches_condition? } }
I have tried both:
SELECT DISTINCT "A".* FROM "A"
INNER JOIN "B"
ON "B"."a_id" = "A"."id"
WHERE "B"."string' = 'STRING'
as well as:
SELECT DISTINCT "A".* FROM "A"
INNER JOIN "B"
ON "B"."a_id" = "A"."id"
AND "B"."string' = 'STRING'
In both cases (as I expected), it returned records from table A if any associated record from B matched the condition. I'm sure there's a relatively simple solution, but my understanding of SQL just isn't providing it to me at the moment. And all of my searching thru SO and Google has proven fruitless.
I would suggest the following:
select distinct a.*
from a inner join
(
select b.a_id
from b
group by b.a_id
having min(b.string) = max(b.string) and min(b.string) = 'string'
) c on a.id = c.a_id
Alternatively:
select distinct a.*
from a inner join b on a.id = b.a_id
where not exists (select 1 from b c where c.a_id = a.id and c.string <> 'string')
Note: In the above examples, only change the symbols a and b to the names of your tables; the other identifiers are merely aliases and should not be changed.

What's the difference between these two SQL statements?

Here are two SQL statements below which I think are equal, but when I run the scripts the second one is much slower, can anyone tell me why?
First one:
select
a.name, if(b.score1 = 0, b.score2, b.score1)
from
a, b
where
a.id = b.id
and if(b.score1 = 0, b.score2, b.score1) > 0
Second one:
select
a.name, temp.score
from
a, b,
(select if(b.score1 = 0, b.score2, b.score1) as score from b) as temp
where
a.id = b.id
and temp.score > 0
The above is a simple example,if my query is:
select a.name,
if(b.usedname1='',if(b.usedname2='',b.usedname3,b.usedname2),b.usedname1)
from a,b
where a.id=b.id and
if(b.usedName1='',if(b.useNname2='',b.usedname3,b.usedname2),b.usedname1)<>'tom';
I got 5 more used names in my table, is there any way to simplify this kind of statement?
The right way to write the query is to use proper, explicit, standard JOIN syntax.
I would write this as:
select a.name, (case when b.score1 = 0 then b.score2 else b.score1 end)
from a join
b
on a.id = b.id
where (b.score1 = 0 and b.score2 > 0) or b.score1 > 0;
I suspect that you might really want greatest() rather than a conditional expression, but that is just speculation.
The second statement has an additional join. I have no idea why you think a query with three table references and two joins would be equivalent to a query with two table references and one join.
If you'd written the same query, I'd say quite possibly nothing. Use the query plan generated by whatever DB engine you are using.
However as you have introduced temp and still joined to b they are not the same.
This is probably the same:
select a.name,temp.score from a,temp,
(select if(b.score1=0,b.score2,b.score1) as score from b) as temp
where a.id=temp.id and temp.score>0

SQL - How to put a condition for which table is selected without left join

I have a flag in a table which value ( 1 for US, or 2 for Global) indicates if the data will be in Table A or Table B.
A solution that works is to left join both tables; however this slows down significantly the scripts (from less than a second to over 15 seconds).
Is there any other clever way to do this? an equivalent of
join TableA only if TableCore.CountryFlag = "US"
join TableB only if TableCore.CountryFlag = "global"
Thanks a lot for the help.
You can try using this approach:
-- US data
SELECT
YourColumns
FROM
TableCore
INNER JOIN TableA AS T ON TableCore.JoinColumn = T.JoinColumn
WHERE
TableCore.CountryFlag = 'US'
UNION ALL
-- Non-US Data
SELECT
YourColumns -- These columns must match in number and datatype with previous SELECT
FROM
TableCore
INNER JOIN TableB AS T ON TableCore.JoinColumn = T.JoinColumn
WHERE
TableCore.CountryFlag = 'global'
However, if the result is still slow, you might want to check if the TableCore table has a index on CountryFlag and JoinColumn, and TableA and TableB an index on JoinColumn.
The basic structure is:
select . . ., coalesce(a.?, b.?) as ?
from tablecore c left join
tablea a
on c.? = a.? and c.countryflag = 'US' left join
tableb b
on c.? b.? and c.counryflag = 'global';
This version of the query can take advantage of indexes on tablea(?) and tableb(?).
If you have a complex query, this portion is probably not responsible for the performance problem.

SQL: Check for 'value' is not null returns TRUE when 'value' is null

I am using SQL Server 2008, and when I perform a left join to a table where the outer table does not have any records then I am seeing weird behavior from my where clause. If I do a check for a value from my outer table being 'not null' it sometimes returns true.
select *
from foo f left join bar b on b.id=f.id
where f.id=#id and (f.status = 1 or b.Price is not null)
When my f.status = 0 and b.Price does not exist (or, appears as null in the select) this query selects records where f.status = 0 and b.Price is null, even though (FALSE OR FALSE) should be FALSE.
If I just perform this query, it works as expected and anything without a record in 'bar' does not get selected.
select *
from foo f left join bar b on b.id=f.id
where f.id=#id and b.Price is not null
Having b.Price is not null as part of an or operation seems to be causing issue for some reason. What could be wrong with this query? I run the same query with similar data on a SQL Server 2012 machine and do not see this issue, could it be related the the version of SQL Server I am using?
These two formulations are not the same, as you have discovered.
In the first query, price can be NULL for two reasons:
There is no match from the left join.
There is a match and b.Price is null
I highly recommend the second approach, putting the condition in the on clause. However, if you do use the first one, make the comparison to a column used in the join:
where f.id = #id and (f.status = 1 or b.id is not null)
You could try an OUTER APPLY like this:
SELECT *
FROM foo f
OUTER APPLY (
SELECT *
FROM bar b
WHERE f.id = b.id
AND (
f.STATUS = 1
OR b.Price IS NOT NULL
) b
)
WHERE f.id = #id
And I also suggest using the columns instead of *, bad practice. The Outer Apply is sort of like a left join and in this case it will filter all the data from the bar table and bring you back only the data you need.
Would a CASE statement work?
IE
SELECT
(etc etc code)
CASE WHEN b.Price is not null THEN 1 ELSE 0 END AS [MyBooleanCheck]
FROM (etc etc code)

Restrict SQL subquery in SELECT

I thought the subquery within the select statement will be restricted by the FROM and/or JOIN statements.
Therefore, my query always returns an error because there is more than one row in the subquery.
SELECT
dbo.Countries.Name,
dbo.Countries.ISO2,
(SELECT dbo.CountryFields.Field
FROM dbo.CountryFields
WHERE dbo.CountryFields.Field = 'Population') AS Population
FROM
dbo.CountryFields
INNER JOIN
dbo.Countries ON (dbo.CountryFields.Countries_Id = dbo.Countries.Countries_Id)
How can I restrict the number of rows in my subquery?
Do I need there also an inner join Statement inside the subquery? I hoped the subquery will inherit from normal SELECT so I don't need manual restrictions.
The column "Field" contains more than "Population" and I would like to show more rows in the SELECT statement with subselects but now ... I can't even get one column to work. :-(
I think you want something like this:
SELECT
a.Name,
a.ISO2,
(SELECT TOP 1 b.Field FROM dbo.CountryFields b WHERE b.Countries_Id = a.Countries_Id AND b.Field = 'Population') AS Population,
(SELECT TOP 1 b.Field FROM dbo.CountryFields b WHERE b.Countries_Id = a.Countries_Id AND b.Field = 'Capital') AS Capital,
(SELECT TOP 1 b.Field FROM dbo.CountryFields b WHERE b.Countries_Id = a.Countries_Id AND b.Field = 'Area') AS Area
FROM
dbo.Countries a
Of course there are ways to optimize the above query, but it's always a tradeoff between readability and speed.
Good luck!
I think that something like this is the proper query:
SELECT
C.Name,
C.ISO2,
ISNULL(CF_POP.Value,0) AS [Population],
ISNULL(CF_F2.Value,0) AS [Field2],
ISNULL(CF_F3.Value,0) AS [Field3]
FROM
dbo.Countries AS C
LEFT JOIN dbo.CountryFields AS CF_POP ON (C.Countries_Id = CF_POP.Countries_Id) AND (CF_POP.Field = 'Population')
LEFT JOIN dbo.CountryFields AS CF_F2 ON (C.Countries_Id = CF_F2.Countries_Id) AND (CF_F2.Field = 'Field2')
LEFT JOIN dbo.CountryFields AS CF_F3 ON (C.Countries_Id = CF_F3.Countries_Id) AND (CF_F3.Field = 'Field3')
In this example you connect each row from CountryFields as a column. I use LEFT JOIN, because I don't know how complete is your data (if you want to see blanks you have to remove ISNULL). I also put column Value, because I suppose that there must be second column which corresponds to CountryFields.Field. This also can be done with CROSS APPLY, but in that case syntax will be different.