What's the difference between these two SQL statements? - sql

Here are two SQL statements below which I think are equal, but when I run the scripts the second one is much slower, can anyone tell me why?
First one:
select
a.name, if(b.score1 = 0, b.score2, b.score1)
from
a, b
where
a.id = b.id
and if(b.score1 = 0, b.score2, b.score1) > 0
Second one:
select
a.name, temp.score
from
a, b,
(select if(b.score1 = 0, b.score2, b.score1) as score from b) as temp
where
a.id = b.id
and temp.score > 0
The above is a simple example,if my query is:
select a.name,
if(b.usedname1='',if(b.usedname2='',b.usedname3,b.usedname2),b.usedname1)
from a,b
where a.id=b.id and
if(b.usedName1='',if(b.useNname2='',b.usedname3,b.usedname2),b.usedname1)<>'tom';
I got 5 more used names in my table, is there any way to simplify this kind of statement?

The right way to write the query is to use proper, explicit, standard JOIN syntax.
I would write this as:
select a.name, (case when b.score1 = 0 then b.score2 else b.score1 end)
from a join
b
on a.id = b.id
where (b.score1 = 0 and b.score2 > 0) or b.score1 > 0;
I suspect that you might really want greatest() rather than a conditional expression, but that is just speculation.
The second statement has an additional join. I have no idea why you think a query with three table references and two joins would be equivalent to a query with two table references and one join.

If you'd written the same query, I'd say quite possibly nothing. Use the query plan generated by whatever DB engine you are using.
However as you have introduced temp and still joined to b they are not the same.
This is probably the same:
select a.name,temp.score from a,temp,
(select if(b.score1=0,b.score2,b.score1) as score from b) as temp
where a.id=temp.id and temp.score>0

Related

Is there a way to create an arel_table from a query?

I have two tables (A and B) and a relatively complicated Active::Record::Relation that selects from a join of these two tables. The query executes correctly with ActiveRecord::Base.connection.exec_query joined.to_sql, that is, it prints out the columns I want from each table (A.id, A.title, b.num).
I would like to then pass this "joined" table as an Arel::Table, to be used in the rest of the program. However, when I run at_j=joined.arel_table, the Arel table is created from the original database A, not from the one resulting from the "joined" query, i.e. I get all the columns from A (not only the selected ones), and none of the columns from B.
I realise that a first step would be to create an arel table from an already filtered table, i.e. if A has columns id, title, c1, c2, c3... I would like to be able to do:
filtered=A.select(:id,:title)
at_f=filtered.arel_table
and only get id and title in at_f, but that is not what happens, I also get c1, c2, c3....
I know I could do
at_f=A.arel_table.project(:id,:title)
but this outputs an Arel::SelectManager, and I need to pass an Arel::Table (that is out of my hands).
I also would rather not build the query in Arel, because I need to modify the table A that was given as an input, and I can do that using _selct! and joins!.
Is there a way to achieve this? I thought of using something like
at_f=Arel::Table.new(filtered.to_sql)
but that fails, unsurprisingly...
Thanks in advance for your help.
................................
In case this is useful, this is how I get the "joined" active record relation:
A._select!(:id,:title,'b.num')
bf=B.where(c1: 'x',c2: 'y')
num=bf.select('id_2 AS A_id, COUNT(id_2) AS num').group(:id_2)
A.joins!("LEFT OUTER JOIN (#{num.to_sql}) b ON A.id = b.A_id")
and this is the query it generates:
# A.to_sql:
SELECT `A`.`id`, `A`.`title`, `b`.`num`
FROM `A` LEFT OUTER JOIN
(SELECT id_2 AS A_id, COUNT(id_2) AS num
FROM `B` WHERE `B`.`c1` = 'x' AND `B`.`c2` = 'y'
GROUP BY `B`.`id_2`) b
ON A.id = b.A_id
Maybe I understand what you are trying for although I am not sure about the whole Arel::Table part but we can get you that AR Relation from A as follows:
b_table = B.arel_table
A.joins(
Arel::Nodes::OuterJoin.new(b, b.create_on(
b[:id_2].eq(A.arel_table[:id])
.and(b[:c1].eq('x'))
.and(b[:c2].eq('y'))
)))
.select(:id, :title, b[:id_2].count.as('num'))
.group(:id,:title)
This will result in an ActiveRecord::Relation object and when executed will return A objects with only the following attributes: id, title, and num.
The SQL will be:
SELECT
a.id,
a.title,
COUNT(b.id_2) AS num
FROM
a
LEFT OUTER JOIN b ON b.id_2 = a.id
AND b.c1 = 'x'
AND b.c2 = 'y'
GROUP BY
a.id,
a.title
Which is equivalent to what you have now.
If you truly want to build the query you have now we can certainly do that without issue but this is a bit cleaner.
If this is not your intended outcome please clarify and I will update accordingly.
Notes:
I would like to make it clear that at no time will you be able to "...pass an Arel::Table..." that also contains query syntactics as that is not what a Arel::Table is.
We can produce an Arel::Nodes::TableAlias which kind of duck types an Arel::Table for most intents and purposes and will allow for a query (subquery).
It may be helpful to you to know that you can convert an AR query to Arel by simply using the arel method. For Example:
arel_query = A.select(:id,:title).arel
#=> #<Arel::SelectManager:0x00007fffd541dd90 ...>
arel_query.to_sql
#=> "SELECT a.id, a.title FROM a"
UPDATE with Additional Information:
You can create an Arel::Table out of nothing t =Arel::Table.new('c')
You can use this as a point of reference for table and column constructs e.g. t[:id] will return an Attribute and will generate SQL of c.id
Arel::Table#project - is the SELECT command and returns SelectManager (this object is the primary means of interaction with the AST and the table). SelectManager also has project to add to the current projections.
You can use a Table or a TableAlias as a source for the SelectManager using #from
c = Arel::Nodes::TableAlias.new([our query from above], t.name)
sm = Arel::SelectManager.new(c)
# alternately sm = Arel::SelectManager.new; sm.from(c)
sm.project(c[:id])
#=> "SELECT [c].[id] FROM (SELECT a.id, a.title, COUNT(b.id_2) AS num FROM a LEFT OUTER JOIN b ON b.id_2 = a.id AND b.c1 = 'x' AND b.c2 = 'y' GROUP BY a.id, a.title) [c]
You can use the t table we created above or the b table alias to add columns, sort, etc.
sm.project(b[:title],t[:num]).order(t[:num])
sm.to_sql
#=> SELECT [c].[id], [c].[title], [c].[num] FROM (SELECT a.id, a.title, COUNT(b.id_2) AS num FROM a LEFT OUTER JOIN b ON b.id_2 = a.id AND b.c1 = 'x' AND b.c2 = 'y' GROUP BY a.id, a.title) [c] ORDER BY [c].[num]
You can access the "froms" in the SelectManager using the froms method which will give you access to the TableAlias in this case which may be what you are looking for regarding the need for an Arel::Table
sm.froms[0]
#=> #<Arel::Nodes::TableAlias:0x00007fffbddc7160 ...>

Need help in optimizing sql query

I am new to sql and have created the below sql to fetch the required results.However the query seems to take ages in running and is quite slow. It will be great if any help in optimization is provided.
Below is the sql query i am using:
SELECT
Date_trunc('week',a.pair_date) as pair_week,
a.used_code,
a.used_name,
b.line,
b.channel,
count(
case when b.sku = c.sku then used_code else null end
)
from
a
left join b on a.ma_number = b.ma_number
and (a.imei = b.set_id or a.imei = b.repair_imei
)
left join c on a.used_code = c.code
group by 1,2,3,4,5
I would rewrite the query as:
select Date_trunc('week',a.pair_date) as pair_week,
a.used_code, a.used_name, b.line, b.channel,
count(*) filter (where b.sku = c.sku)
from a left join
b
on a.ma_number = b.ma_number and
a.imei in ( b.set_id, b.repair_imei ) left join
c
on a.used_code = c.code
group by 1,2,3,4,5;
For this query, you want indexes on b(ma_number, set_id, repair_imei) and c(code, sku). However, this doesn't leave much scope for optimization.
There might be some other possibilities, depending on the tables. For instance, or/in in the on clause is usually a bad sign -- but it is unclear what your intention really is.

SQL: Check for 'value' is not null returns TRUE when 'value' is null

I am using SQL Server 2008, and when I perform a left join to a table where the outer table does not have any records then I am seeing weird behavior from my where clause. If I do a check for a value from my outer table being 'not null' it sometimes returns true.
select *
from foo f left join bar b on b.id=f.id
where f.id=#id and (f.status = 1 or b.Price is not null)
When my f.status = 0 and b.Price does not exist (or, appears as null in the select) this query selects records where f.status = 0 and b.Price is null, even though (FALSE OR FALSE) should be FALSE.
If I just perform this query, it works as expected and anything without a record in 'bar' does not get selected.
select *
from foo f left join bar b on b.id=f.id
where f.id=#id and b.Price is not null
Having b.Price is not null as part of an or operation seems to be causing issue for some reason. What could be wrong with this query? I run the same query with similar data on a SQL Server 2012 machine and do not see this issue, could it be related the the version of SQL Server I am using?
These two formulations are not the same, as you have discovered.
In the first query, price can be NULL for two reasons:
There is no match from the left join.
There is a match and b.Price is null
I highly recommend the second approach, putting the condition in the on clause. However, if you do use the first one, make the comparison to a column used in the join:
where f.id = #id and (f.status = 1 or b.id is not null)
You could try an OUTER APPLY like this:
SELECT *
FROM foo f
OUTER APPLY (
SELECT *
FROM bar b
WHERE f.id = b.id
AND (
f.STATUS = 1
OR b.Price IS NOT NULL
) b
)
WHERE f.id = #id
And I also suggest using the columns instead of *, bad practice. The Outer Apply is sort of like a left join and in this case it will filter all the data from the bar table and bring you back only the data you need.
Would a CASE statement work?
IE
SELECT
(etc etc code)
CASE WHEN b.Price is not null THEN 1 ELSE 0 END AS [MyBooleanCheck]
FROM (etc etc code)

Restrict SQL subquery in SELECT

I thought the subquery within the select statement will be restricted by the FROM and/or JOIN statements.
Therefore, my query always returns an error because there is more than one row in the subquery.
SELECT
dbo.Countries.Name,
dbo.Countries.ISO2,
(SELECT dbo.CountryFields.Field
FROM dbo.CountryFields
WHERE dbo.CountryFields.Field = 'Population') AS Population
FROM
dbo.CountryFields
INNER JOIN
dbo.Countries ON (dbo.CountryFields.Countries_Id = dbo.Countries.Countries_Id)
How can I restrict the number of rows in my subquery?
Do I need there also an inner join Statement inside the subquery? I hoped the subquery will inherit from normal SELECT so I don't need manual restrictions.
The column "Field" contains more than "Population" and I would like to show more rows in the SELECT statement with subselects but now ... I can't even get one column to work. :-(
I think you want something like this:
SELECT
a.Name,
a.ISO2,
(SELECT TOP 1 b.Field FROM dbo.CountryFields b WHERE b.Countries_Id = a.Countries_Id AND b.Field = 'Population') AS Population,
(SELECT TOP 1 b.Field FROM dbo.CountryFields b WHERE b.Countries_Id = a.Countries_Id AND b.Field = 'Capital') AS Capital,
(SELECT TOP 1 b.Field FROM dbo.CountryFields b WHERE b.Countries_Id = a.Countries_Id AND b.Field = 'Area') AS Area
FROM
dbo.Countries a
Of course there are ways to optimize the above query, but it's always a tradeoff between readability and speed.
Good luck!
I think that something like this is the proper query:
SELECT
C.Name,
C.ISO2,
ISNULL(CF_POP.Value,0) AS [Population],
ISNULL(CF_F2.Value,0) AS [Field2],
ISNULL(CF_F3.Value,0) AS [Field3]
FROM
dbo.Countries AS C
LEFT JOIN dbo.CountryFields AS CF_POP ON (C.Countries_Id = CF_POP.Countries_Id) AND (CF_POP.Field = 'Population')
LEFT JOIN dbo.CountryFields AS CF_F2 ON (C.Countries_Id = CF_F2.Countries_Id) AND (CF_F2.Field = 'Field2')
LEFT JOIN dbo.CountryFields AS CF_F3 ON (C.Countries_Id = CF_F3.Countries_Id) AND (CF_F3.Field = 'Field3')
In this example you connect each row from CountryFields as a column. I use LEFT JOIN, because I don't know how complete is your data (if you want to see blanks you have to remove ISNULL). I also put column Value, because I suppose that there must be second column which corresponds to CountryFields.Field. This also can be done with CROSS APPLY, but in that case syntax will be different.

How to perform CASE on values from subquery

This is going to be difficult to explain, but here goes.
I am looking to perform a CASE condition in a SELECT clause that will use the results of two calculations to determine which calculation value to return for a column value.
Maybe a code sample will help:
this works:
SELECT
A.[COLUMN1]
, B.[COLUMN1]
, CASE
WHEN A.[COLUMN2] + A.[COLUMN3] >= B.[COLUMN2] + B.[COLUMN3] THEN A.[COLUMN2] + A.[COLUMN3]
ELSE B.[COLUMN2] + B.[COLUMN3]
FROM
[TABLE_A] A
INNER JOIN [TABLE_B] B INNER JOIN ON A.ID = B.ID
The problem here is that the query above, in the case statement, is forced to perform the calculation twice. Once for the WHEN clause and again for the THEN clause.
I want to do something like this, but SQL is not happy with it.
SELECT
A.[COLUMN1]
, B.[COLUMN1]
, CASE
WHEN AB.X >= AB.Y THEN AB.X
ELSE AB.Y
END
FROM ((A.[COLUMN2] + A.[COLUMN3]) X, (B.[COLUMN2] + B.[COLUMN3]) Y)
FROM
[TABLE_A] A
INNER JOIN [TABLE_B] B INNER JOIN ON A.ID = B.ID
Is this even possible? In the second example, I am calculating the values only once and referring to them in the case statement, both for the WHEN and the THEN clauses.
I would much prefer to push the calculations down into each table. This keeps the structure of the query quite similar. So, a syntactically correct (or almost correct) version would be:
SELECT A.[COLUMN1], B.[COLUMN1],
(CASE WHEN a.col_2_3 >= b.col_2_3 THEN a.col_2_3
ELSE b.col_2_3
end)
FROM (select a.*, (A.[COLUMN2] + A.[COLUMN3]) as col_2_3
from [TABLE_A] a
) a INNER JOIN
(select b.*, (B.[COLUMN2] + B.[COLUMN3]) as col_2_3
from [TABLE_B] b
)b
ON a.ID = b.ID
There are so many important factors in performance, and overhead for simple calculations is just not one of them. Reading the data and the join are way, way more expensive than simple calculations.
However, moving variables into subqueries is useful for a few reasons. First, the calculations could be more expensive (using subqueries, say). It also helps with readability and hence maintainability.
Finally, a SQL engine could decide to evaluate those expressions just once. In practice, I'm guessing that none make that trivial optimization.
You could reformulate it as this
SELECT a_column1,
b_column1,
CASE
WHEN x >= y THEN x
ELSE y
END AS foo
FROM (SELECT A.[column1] A_COLUMN1,
B.[column1] B_COLUMN1,
( A.[column2] + A.[column3] ) X,
( B.[column2] + B.[column3] ) Y
FROM [table_a] A
INNER JOIN [table_b] B
ON A.id = B.id)t
But I'm not sure it will make a difference since the operations may be performed once per row anyway