Convert multiple SQL code with multiple subqueries into a single query - sql

I'm starting to handle an old database that was generated years ago with ACCESS. All the queries have been designed with the ACCESS query wizard and they seem to be very time consuming and I would like to improve their performance.
All queries depend on at least three subqueries and I would like to rewrite the SQL code to convert them into a single query.
Here you have an example of what I'm talking about:
This is the main query:
SELECT Subquery1.pid, Table4.SIB, Subquery1.event,
Subquery1.event_date, Subquery2.GGG, Subquery3.status FROM Subquery1
LEFT JOIN ((Table4 LEFT JOIN Subquery2 ON Table4.SIB =
Subquery2.SIB) LEFT JOIN Subquery3 ON Table4.SIB = Subquery3.SIB)
ON Subquery1.pid = Table4.PID;
This main query depends on three subqueries:
Subquery1
SELECT Table2.id, Table2.pid, Table2.npid, Table3.event_date,
Table3.event, Table3.notes, Table2.other FROM Table2 INNER JOIN Table3
ON Table2.id = Table3.subject_id WHERE (((Table2.pid) Is Not Null) AND
((Table3.event_date)>#XX/XX/XXXX#) AND ((Table3.event) Like "*AAAA" Or
(Table3.event)="BBBB")) ORDER BY Table2.pid, Table3.event_date DESC;
Subquery2
SELECT Table1.SIB, IIf(Table1.GGG Like "AAA","BBB", IIf(Table1.GGG
Like "CCC","BBB", IIf(Table1.GGG Like "DDD","DDD","EEE"))) AS GGG FROM
Table1;
Subquery3
SELECT Table5.SIB, Table5.PID, IIf(Table5.field1 Like
"1","ZZZ",IIf(Table5.field1 Like "2","ZZZ",IIf(Table5.field1 Like
"3","ZZZ",IIf(Table5.field1 Like "4","HHH",IIf(Table5.field1 Like
"5","HHH",IIf(Table5.field1 Like "6","HHH","UUU")))))) AS SSS FROM
Table5;
Which would be the best way of improving the performance of this query and converting all the subqueries into a single statement?
I can handle each subquery, but I'm having a hard time joining them together.

If this:
Table5.field1 Like "3"
is really how some of your subqueries are written (without actual wild characters) you can save a lot of time by changing it to
Table5.field1="3"

'''you can create transient tables for each sub query'''
CREATE Transient table1 AS
'''Your sub query goes here'''
CREATE Transient table2 AS
'''Your sub query goes here'''
'''Main query to merge them into one'''
SELECT '''column names'''
FROM
table1
LEFT JOIN table2
ON table1.common_column = table2.common_column
LEFT JOIN table3
ON table1.common_column = table3.common_column
'''similarly you can combine all sub queries/transient tables'''

Related

There are a few questions about the select statement

I have a few questions about the select statement.
First of all, I have normalized 15 tables for this select query.
The problem is invisible because there is not much data right now.
However, since I try to process many tables in one select query, it seems to cause problems later.
So I want to add a few more select statements to divide the tables to search, but I want to know how different it is from doing it at once.
Secondly, if I use join, I will use outer join. If I join multiple tables with outer join, I'm not sure how to use left outer join and right outer join.
The currently created select query refers to 8 tables and one join is linked.
That is, the remaining rest of the tables have obtained data in subqueries and the remaining eight tables are likely to use join.
I would appreciate it if you could let me know the direction of the multiple outer joins.
Let me briefly show you some of the current select queries.
select
a.cal1,a.cal2,a.cal3,...,
(select b.cal1 from b
where a.cal4=b.cal2)
as "bcals",
(select c.cal1 from c
where a.cal5=c.cal2)
as "ccals",
....,
(select e.cal1 from e
where a.caln=e.cal2)
as "ecals",
(select sum(extract(year from age(f.endday,f.startday))
from f
where e.cal1=a.cal1)
as "fcals",
g.cal1,g.cal2,g.cal3,...,
(select h.cal1 from h
where g.cal4=h.cal2)
as "hcals"
from a left outer join g on a.cal1=g.cal5
where a.cal1=?;
Result:
a.cal1|a.cal2|a.cal3|...|hcals
var1 |var2 |var3 |...|varn
After this, I wonder how to join the rest of the tables.
To sum up
If there are many tables that need to be included in a select query statement, what is the difference between performance and performance when this complex query is divided into multiple select statements?
If we write inside a select statement, how should outer join be?
Is there a problem with the query?
Actually your code is correct, but it looks very complex. People will find it difficult to understand it. Using joins you can minimize the lines of code and also make it more readable.
SELECT
TBL1.AMOUNT T1,
TBL2.AMOUNT T2,
TBL3.AMOUNT T3
FROM TBL1
LEFT JOIN TBL2 ON TBL2.ID = TBL1.ID
LEFT JOIN TBL3 ON TBL3.ID = TBL1.ID
In the above code , there are three tables, and two joins. One can easily understand and debug/make changes. Please try this for your code.

Query tuning - Multiple join conditions using likes

I've hit a bit of a situation and my novice level SQL experience has met it's match.
I have a query
SELECT a.One,
a.Two,
a.Three,
a.Four,
b.One,
b.Two
FROM table1 a
INNER JOIN table2 b on b.Four = a.Nine
and b.Six like a.One
and b.Seven like b.Two
Table1 is 25000 rows
Table2 is 22 million rows
like clause works like this 'test%', so it should utilize the indexes I have and I don't think I need a full text index because its trailing and not preceding.
I have an index that exists and works very efficiently when I use a straight equals instead of a like.
When I look at the query plan, I see that I am going through every row in table2 (which I was suprised). How does the inner join work in terms of what gets executed first? Does it combine the three columns as the join? Or does it Join with the first column, then second, then third.
Is there a better way to write this query?
The problem is that an index can only be used for one like 'pattern%' comparison. This is an inequality, so index usage stops at the first one.
You might have luck by changing the query to a union:
SELECT a.One, a.Two, a.Three, a.Four, b.One, b.Two
FROM table1 a INNER JOIN
table2 b
ON b.Four = a.Nine and b.Six like a.One
UNION
SELECT a.One, a.Two, a.Three, a.Four, b.One, b.Two
FROM table1 a INNER JOIN
table2 b
ON b.Four = a.Nine and bb.Seven like b.Two;
Then, set up the indexes on a(Nine, One) and b(Four, Two). Although the two subqueries should use the indexes, you may get a lot of matches for the intermediate results slowing down the query.

How do I put multiple criteria for a column in a where clause?

I have five results to retrieve from a table and I want to write a store procedure that will return all desired rows.
I can write the query like that temporarily:
Select * from Table where Id = 1 OR Id = 2 or Id = 3
I supposed I need to receive a list of Ids to split, but how do I write the WHERE clause?
So, if you're just trying to learn SQL, this is a short and good example to get to know the IN operator. The following query has the same result as your attempt.
SELECT *
FROM TABLE
WHERE ID IN (SELECT ID FROM TALBE2)
This translates into what is your attempt. And judging by your attempt, this might be the simplest version for you to understand. Although, in the future I would recommend using a JOIN.
A JOIN has the same functionality as the previous code, but will be a better alternative. If you are curious to read more about JOINs, here are a few links from the most important sources
Joins - wikipedia
and also a visual representation of how different types of JOIN work
Another way to do it. The inner join will only include rows from T1 that match up with a row from T2 via the Id field.
select T1.* from T1 inner join T2 on T1.Id = T2.Id
In practice, inner joins are usually preferable to subqueries for performance reasons.

INNER JOIN with complex condition dramatically increases the execution time

I have 2 tables with several identical fields needed to be linked in JOIN condition. E.g. in each table there are fields: P1, P2. I want to write the following join query:
SELECT ... FROM Table1
INNER JOIN
Table2
ON Table1.P1 = Table2.P1
OR Table1.P2 = Table2.P2
OR Table1.P1 = Table2.P2
OR Table1.P2 = Table2.P1
In the case I have huge tables this request is executing a lot of time.
I tried to test how long will be the request of a query with one condition only. First, I have modified the tables in such way all data from P2 & P1 where copied as new rows into Table1 & Table2. So my query is simple:
SELECT ... FROM Table1 INNER JOIN Table2 ON Table1.P = Table2.P
The result was more then surprised: the execution time from many hours (the 1st case) was reduced to 2-3 seconds!
Why is it so different? Does it mean the complex conditions are always reduce performance? How can I improve the issue? May be P1,P2 indexing will help? I want to remain the 1st DB schema and not to move to one field P.
The reason the queries are different is because of the join strategies being used by the optimizer. There are basically four ways that two tables can be joined:
"Hash join": Creates a hash table on one of the tables which it uses to look up the values in the second.
"Merge join": Sorts both tables on the key and then readsthe results sequentially for the join.
"Index lookup": Uses an index to look up values in one table.
"Nested Loop": Compars each value in each table to all the values in the other table.
(And there are variations on these, such as using an index instead of a table, working with partitions, and handling multiple processors.) Unfortunately, in SQL Server Management Studio both (3) and (4) are shown as nested loop joins. If you look more closely, you can tell the difference from the parameters in the node.
In any case, your original join is one of the first three -- and it goes fast. These joins can basically only be used on "equi-joins". That is, when the condition joining the two tables includes an equality operator.
When you switch from a single equality to an "in" or set of "or" conditions, the join condition has changed from an equijoin to a non-equijoin. My observation is that SQL Server does a lousy job of optimization in this case (and, to be fair, I think other databases do pretty much the same thing). Your performance hit is the hit of going from a good join algorithm to the nested loops algorithm.
Without testing, I might suggest some of the following strategies.
Build an index on P1 and P2 in both tables. SQL Server might use the index even for a non-equijoin.
Use the union query suggested in another solution. Each query should be correctly optimized.
Assuming these are 1-1 joins, you can also do this as a set of multiple joins:
from table1 t1 left outer join
table2 t2_11
on t1.p1 = t2_11.p1 left outer join
table2 t2_12
on t1.p1 = t2_12.p2 left outer join
table2 t2_21
on t1.p2 = t2_21.p2 left outer join
table2 t2_22
on t1.p2 = t2_22.p2
And then use case/coalesce logic in the SELECT to get the value that you actually want. Although this may look more complicated, it should be quite efficient.
you can use 4 query and Union there result
SELECT ... FROM Table1
INNER JOIN
Table2
ON Table1.P1 = Table2.P1
UNION
SELECT ... FROM Table1
INNER JOIN
Table2
ON Table1.P1 = Table2.P2
UNION
SELECT ... FROM Table1
INNER JOIN
Table2
ON Table1.P2 = Table2.P1
UNION
SELECT ... FROM Table1
INNER JOIN
Table2
ON Table1.P2 = Table2.P2
Does using CTEs help performance?
;WITH Table1_cte
AS
(
SELECT
...
[P] = P1
FROM Table1
UNION
SELECT
...
[P] = P2
FROM Table1
)
, Table2_cte
AS
(
SELECT
...
[P] = P1
FROM Table2
UNION
SELECT
...
[P] = P2
FROM Table2
)
SELECT ... FROM Table1_cte x
INNER JOIN
Table2_cte y
ON x.P = y.P
I suspect, as far as the processor is concerned, the above is just different syntax for the same complex conditions.

Resusing select subquery/result

I am trying to optimize the speed of a query which uses a redundant query block. I am trying to do a row-wise join in sql server 2008 using the query below.
Select * from
(<complex subquery>) cq
join table1 t1 on (cq.id=t1.id)
union
Select * from
<complex subquery> cq
join table2 t2 on (cq.id=t2.id)
The <complex subquery> is exactly the same on both the union sub query pieces except we need to join it with multiple different tables to obtain the same columnar data.
Is there any way i can either rewrite the query to make it faster without using a temporary table to cache results?
Why not use a temporary table and see if that improves the execution stats?
In some circumstances the Query Optimizer will automatically add a spool to the plan that caches sub queries this is basically just a temporary table though.
Have you checked your current plan to be sure that the query is actually being evaluated more than once?
Without a concrete example it's difficult to help, but try a WITH statement, something like:
WITH csq(x,y,z) AS (
<complex subquery>
)
Select * from
csq
join table1 t1 on (cq.id=t1.id)
union
Select * from
csq
join table2 t2 on (cq.id=t2.id)
it sometimes speeds things up no end
Does the nature of your query allow you to invert it? Instead of "(join, join) union", do "(union) join"? Like:
Select * from
(<complex subquery>) cq
join (
Select * from table1 t1
union
Select * from table2 t2
) ts
on cq.id=ts.id
I'm not really sure if double evaluation of your complex query is actually what's wrong. But, as per your question, this would be a form of the query that would encourage SQL to only evaluate <complex query> once.