Optimizing SQL Server view - sql

I am trying to optimize a big query that contains about 10 subqueries on a table with 30+ columns and over 2 million records.
I would like to reduce the amount of selects on this massive table but I don't really know how I could optimize following query to prevent this.
I would like to use some kind of filter on a so that I can just write down a WHERE clause in the column subquery instead of querying tableA again, but I have no clue on how I could do this:
SELECT
col1, col2,
(SELECT COUNT(col3)
FROM tableA ta
INNER JOIN tableB b ON tl.TaskId = b.col6
INNER JOIN tableC c ON b.Id = c.TaskId
WHERE c.ResultCode = 1
AND ta.col4 = a.col4
AND ta.col5 = a.col5) as Executed,
(SELECT COUNT(col3)
FROM tableA ta
INNER JOIN tableB b ON tl.TaskId = b.col6
INNER JOIN tableC c ON b.Id = c.TaskId
WHERE c.ResultCode = 9
AND ta.col4 = a.col4
AND ta.col5 = a.col5) as NotExecuted
FROM
tableA a
GROUP BY
col1, col2, col4, col5

I have few questions about your query (asked in comment), however it is might what you are looking for:
SELECT
col1 ,
col2 ,
COUNT(CASE WHEN c.ResultCode = 1 THEN 1
ELSE NULL
END) AS Executed ,
COUNT(CASE WHEN c.ResultCode = 9 THEN 1
ELSE NULL
END) AS NotExecuted
FROM
tableA a
JOIN tableB b ON a.id = b.tableA_id
JOIN tableC c ON a.id = c.tableB_id
WHERE
c.ResultCode IN ( 1, 9 )
GROUP BY
col1 ,
col2;
If you do not need it in separate columns, then you can aggregate it in separate rows. That would probably work faster:
SELECT
col1 ,
col2 ,
CASE c.ResultCode
WHEN 1 THEN 'Executed'
WHEN 9 THEN 'Not Executed'
END,
COUNT(*)
FROM
tableA a
JOIN tableB b ON a.id = b.tableA_id
JOIN tableC c ON a.id = c.tableB_id
WHERE
c.ResultCode IN ( 1, 9 )
GROUP BY
col1 ,
col2 ,
c.ResultCode;

SELECT
col1, col2, sum(case when c.ResultCode = 1 THEN 1 ELSE 0 END) as Executed,
sum(case when c.ResultCode = 9 THEN 1 ELSE 0 END) as NotExecuted
FROM tableA a
INNER JOIN tableB b ON a.TaskId = b.col6
INNER JOIN tableC c ON b.Id = c.TaskId
GROUP BY col1, col2

Related

Optimizing SQL query having DISTINCT keyword and functions

I have this query that generates about 40,000 records and the execution time of this query is about 1 minute 30 seconds.
SELECT DISTINCT
a.ID,
a.NAME,
a.DIV,
a.UID,
(select NAME from EMPLOYEE where UID= a.UID and UID<>'') as boss_id,
(select DATE(MAX(create_time)) from XYZ where XYZ_ID= 1 and id = a.ID) as TERM1,
(select DATE(MAX(create_time)) from XYZ where XYZ_ID= 2 and id = a.ID) as TERM2,
(select DATE(MAX(create_time)) from XYZ where XYZ_ID= 3 and id = a.ID) as TERM3,
(select DATE(MAX(create_time)) from XYZ where XYZ_ID= 4 and id = a.ID) as TERM4,
(select DATE(MAX(create_time)) from XYZ where XYZ_ID= 5 and id = a.ID) as TERM5,
(select DATE(MAX(create_time)) from XYZ where XYZ_ID= 6 and id = a.ID) as TERM6,
(select DATE(MAX(create_time)) from XYZ where XYZ_ID= 7 and id = a.ID) as TERM7,
(select DATE(MAX(create_time)) from XYZ where XYZ_ID= 8 and id = a.ID) as TERM8
FROM EMPLOYEE a
WHERE ID LIKE 'D%'
I tried using group by, different kinds of join to improve the execution time but couldn't succeed.Both the tables ABC and XYZ are indexed.
Also, I think that the root cause of this problem is either the DISTINCT keyword or the MAX function.
How can I optimize the above query to bring down the execution time to at least less than a minute?
Any help is appreciated.
Query is not tested, this is just an idea on how you could get this done in two different ways.
(SQL Server solutions here)
Using LEFT JOIN for each ID should look something like this:
SELECT a.ID,
a.NAME,
a.DIV,
a.UID,
b.Name as boss_id,
MAX(xyz1.create_time) as TERM1,
MAX(xyz2.create_time) as TERM2,
MAX(xyz3.create_time) as TERM3,
MAX(xyz4.create_time) as TERM4,
MAX(xyz5.create_time) as TERM5,
MAX(xyz6.create_time) as TERM6,
MAX(xyz7.create_time) as TERM7,
MAX(xyz8.create_time) as TERM8
FROM EMPLOYEE a
JOIN EMPLOYEE b on a.UID = b.UID and b.UID <> ''
LEFT JOIN XYZ xyz1 on a.ID = xyz1.ID and xyz1.XYZ_ID = 1
LEFT JOIN XYZ xyz2 on a.ID = xyz2.ID and xyz1.XYZ_ID = 2
LEFT JOIN XYZ xyz3 on a.ID = xyz3.ID and xyz1.XYZ_ID = 3
LEFT JOIN XYZ xyz4 on a.ID = xyz4.ID and xyz1.XYZ_ID = 4
LEFT JOIN XYZ xyz5 on a.ID = xyz5.ID and xyz1.XYZ_ID = 5
LEFT JOIN XYZ xyz6 on a.ID = xyz6.ID and xyz1.XYZ_ID = 6
LEFT JOIN XYZ xyz7 on a.ID = xyz7.ID and xyz1.XYZ_ID = 7
LEFT JOIN XYZ xyz8 on a.ID = xyz8.ID and xyz1.XYZ_ID = 8
WHERE a.ID LIKE 'D%'
GROUP BY a.ID, a.NAME, a.DIV, a.UID, b.Name
Using PIVOT would look something like this:
select * from (
SELECT DISTINCT
a.ID,
a.NAME,
a.DIV,
a.UID,
b.NAME as boss_id,
xyz.xyz_id,
xyz.create_time
FROM EMPLOYEE a
JOIN EMPLOYEE b on a.UID = b.UID and b.UID <> ''
LEFT JOIN (SELECT DATE(MAX(create_time)) create_time, XYZ_ID, ID
from XYZ
where XYZ_ID between 1 and 8
group by XYZ_ID, ID) xyz on a.ID = xyz1.ID
WHERE a.ID LIKE 'D%') src
PIVOT (
max(create_time) for xyz_id IN (['1'], ['2'], ['3'], ['4'],
['5'], ['6'], ['7'], ['8'])
) PIV
Give it a shot
I would recommend group by and conditional aggregation:
SELECT e.ID, e.NAME, e.DIV, e.UID,
DATE(MAX(CASE WHEN XYZ_ID = 1 THEN create_time END)) as term1,
DATE(MAX(CASE WHEN XYZ_ID = 2 THEN create_time END)) as term2,
DATE(MAX(CASE WHEN XYZ_ID = 3 THEN create_time END)) as term3,
DATE(MAX(CASE WHEN XYZ_ID = 4 THEN create_time END)) as term4,
DATE(MAX(CASE WHEN XYZ_ID = 5 THEN create_time END)) as term5,
DATE(MAX(CASE WHEN XYZ_ID = 6 THEN create_time END)) as term6,
DATE(MAX(CASE WHEN XYZ_ID = 7 THEN create_time END)) as term7,
DATE(MAX(CASE WHEN XYZ_ID = 8 THEN create_time END)) as term8
FROM EMPLOYEE e LEFT JOIN
XYZ
ON xyz.ID = e.id
WHERE e.ID LIKE 'D%'
GROUP BY e.ID, e.NAME, e.DIV, e.UID;
I don't understand the logic for boss_id, so I left that out. This should improve the performance significantly.

Find rows where one column value match and other does not

I have two tables A and B
Table A
CODE TYPE
A 1
A 2
A 3
B 1
C 1
C 2
Table B
CODE TYPE
A 1
A 2
A 4
B 2
C 1
C 3
I want to return rows where CODE is in both tables but TYPE is not and also CODE has more than one TYPE in both tables so my result would be
CODE TYPE SOURCE
A 3 Table A
A 4 Table B
C 2 Table A
C 3 Table B
Any help with this?
I think this covers both of your conditions.
select code, coalesce(typeA, typeB) as type, src
from
(
select
coalesce(a.code, b.code) as code,
a.type as typeA,
b.type as typeB,
case when b.type is null then 'A' when a.type is null then 'B' end as src,
count(a.code) over (partition by coalesce(a.code, b.code)) as countA,
count(b.code) over (partition by coalesce(a.code, b.code)) as countB
from
A a full outer join B b
on b.code = a.code and b.type = a.type
) T
where
countA >= 2 and countB >= 2
and (typeA is null or typeB is null)
You can use a full join to see if the code matches and check if the type is null on either of the tables.
select coalesce(a.code,b.code) code, coalesce(a.type,b.type) type,
case when b.type is null then 'A' when a.type is null then 'B' end src
from a
full join b on a.code = b.code and a.type = b.type
where a.type is null or b.type is null
To limit the results to codes which have more than one type, use
select x.code, coalesce(a.type,b.type) type,
case when b.type is null then 'Table A' when a.type is null then 'Table B' end src
from a
full join b on a.code = b.code and a.type = b.type
join (select a.code from a join b on a.code = b.code
group by a.code having count(*) > 1) x on x.code = a.code or x.code = b.code
where a.type is null or b.type is null
order by 1
Using union
with tu as (
select CODE, TYPE, src='Table A'
from TableA
union all
select CODE, TYPE, src='Table B'
from TableB
)
select CODE, TYPE, max(src)
from tu t1
where exists (select 1 from tu t2 where t2.CODE=t1.CODE and t2.src=t1.src and t1.TYPE <> t2.TYPE)
group by CODE, TYPE
having count(*)=1
order by CODE, TYPE

Different output when using count and group by

When trying to get a count of IDs I get a different answer when grouping by day vs when I am not.
select cv.CONV_DAY, count(distinct cv.CLICK_ID)
from
clickcache.click cc
right join(
select distinct cv.CLICK_ID, cv.CONV_DAY, cv.PIXEL_ID
from clickcache.CONVERSION cv
where cv.CLICK_ID IS NOT NULL) cv ON cv.CLICK_ID = cc.ID
where cc.ADV_ACCOUNT_ID = 25176
and cv.CONV_DAY between '2016-8-01' AND '2016-08-07'
and AMP_CLICK_STATUS_ID = 1
AND pixel_id IN
(SELECT DISTINCT conversion_pixel_id
FROM
ampx.campaign_event_funnel ef
JOIN ampx.campaign cp ON
cp.id = ef.campaign_id
AND cp.campaign_status_id = 1
WHERE
ef.account_id IN(25176)
AND include_optimization = 1 )
group by 1
order by 1 asc
This yields 170 which is the correct answer and the I want. This, on the other hand, displays 157.
select count(distinct cv.CLICK_ID)
from
clickcache.click cc
right join(
select distinct cv.CLICK_ID, cv.CONV_DAY, cv.PIXEL_ID
from clickcache.CONVERSION cv
where cv.CLICK_ID IS NOT NULL) cv ON cv.CLICK_ID = cc.ID
where cc.ADV_ACCOUNT_ID = 25176
and cv.CONV_DAY between '2016-8-01' AND '2016-08-07'
and AMP_CLICK_STATUS_ID = 1
AND pixel_id IN
(SELECT DISTINCT conversion_pixel_id
FROM
ampx.campaign_event_funnel ef
JOIN ampx.campaign cp ON
cp.id = ef.campaign_id
AND cp.campaign_status_id = 1
WHERE
ef.account_id IN(25176)
AND include_optimization = 1 )
My question is why do I get this discrepancy and how to fix it to get a proper count?
Thank you!
Your count dependents from right query, maybe you have duplicate row?
example
table1
id name value
1 2 3
table2
id name value
1 4 5
2 6 3
1 6 3
right join tables on value get result
select * from table1 a right join table2 b on a.value = b.value
1 2 3 2 6 3
1 2 3 1 6 3
select count(distinct a.value)
from (select a.id, a.name, a.value, b.id, b.name, b.value
from table1 a right join table2 b on a.value = b.value)
result is 1
select b.id, count(distinct a.value)
from (select a.id, a.name, a.value, b.id, b.name, b.value
from table1 a right join table2 b on a.value = b.value group)
group by b.id
result is two rows
2 1
1 1
My guess is that, you have a problem for this reason.

SQL aggregation query, grouping by entries in junction table

I have TableA in a many-to-many relationship with TableC via TableB. That is,
TableA TableB TableC
id | val fkeyA | fkeyC id | data
I wish the do select sum(val) on TableA, grouping by the relationship(s) to TableC. Every entry in TableA has at least one relationship with TableC. For example,
TableA
1 | 25
2 | 30
3 | 50
TableB
1 | 1
1 | 2
2 | 1
2 | 2
2 | 3
3 | 1
3 | 2
should output
75
30
since rows 1 and 3 in Table have the same relationships to TableC, but row 2 in TableA has a different relationship to TableC.
How can I write a SQL query for this?
SELECT
sum(tableA.val) as sumVal,
tableC.data
FROM
tableA
inner join tableB ON tableA.id = tableB.fkeyA
INNER JOIN tableC ON tableB.fkeyC = tableC.id
GROUP by tableC.data
edit
Ah ha - I now see what you're getting at. Let me try again:
SELECT
sum(val) as sumVal,
tableCGroup
FROM
(
SELECT
tableA.val,
(
SELECT cast(tableB.fkeyC as varchar) + ','
FROM tableB WHERE tableB.fKeyA = tableA.id
ORDER BY tableB.fkeyC
FOR XML PATH('')
) as tableCGroup
FROM
tableA
) tmp
GROUP BY
tableCGroup
Hm, in MySQL it could be written like this:
SELECT
SUM(val) AS sumVal
FROM
( SELECT
fkeyA
, GROUP_CONCAT(fkeyC ORDER BY fkeyC) AS grpC
FROM
TableB
GROUP BY
fkeyA
) AS g
JOIN
TableA a
ON a.id = g.fkeyA
GROUP BY
grpC
SELECT sum(a.val)
FROM tablea a
INNER JOIN tableb b ON (b.fKeyA = a.id)
GROUP BY b.fKeyC
It seems that is it needed to create a key_list in orther to allow group by:
75 -> key list = "1 2"
30 -> key list = "1 2 3"
Because GROUP_CONCAT don't exists in T-SQL:
WITH CTE ( Id, key_list )
AS ( SELECT TableA.id, CAST( '' AS VARCHAR(8000) )
FROM TableA
GROUP BY TableA.id
UNION ALL
SELECT TableA.id, CAST( key_list + ' ' + str(TableB.id) AS VARCHAR(8000) )
FROM CTE c
INNER JOIN TableA A
ON c.Id = A.id
INNER join TableB B
ON B.Id = A.id
WHERE A.id > c.id --avoid infinite loop
)
Select
sum( val )
from
TableA inner join
CTE on (tableA.id = CTE.id)
group by
CTE.key_list

T-SQL Removing multiple LEFT JOIN

I have such query. It returns ColA and ColB from TableA and UserName from table Users. Then it displays several fields from TableB as additional columns to results. It works but is there any better way than using these multiple LEFT JOINS ?
SELECT a.COlA, a.ColB, u.UserName,
b1.Value,
b2.Value,
b3.Value,
b4.Value,
FROM TableA a JOIN Users u ON a.UserId = u.UserId
LEFT JOIN TableB b1 ON a.EventId = b1.EventId AND b1.Code = 5
LEFT JOIN TableB b2 ON a.EventId = b2.EventId AND b2.Code = 15
LEFT JOIN TableB b3 ON a.EventId = b3.EventId AND b3.Code = 18
LEFT JOIN TableB b4 ON a.EventId = b4.EventId AND b4.Code = 40
WHERE (a.UserId = 3) ORDER BY u.UserName ASC
TableB looks like:
Id | EventId | Code | Value
----------------------------
1 | 1 | 5 | textA
2 | 1 | 15 | textB
3 | 1 | 18 | textC
Sometimes Code is missing but for each event there are no duplicated Codes (so each LEFT JOIN is just another cell in the same result record).
I cannot understand why you want to change something that is working, but here's another way (which does those LEFT joins, but in a different way):
SELECT a.COlA, a.ColB, u.UserName,
( SELECT b.Value FROM TableB b WHERE a.EventId = b.EventId AND b.Code = 5 ),
( SELECT b.Value FROM TableB b WHERE a.EventId = b.EventId AND b.Code = 15 ),
( SELECT b.Value FROM TableB b WHERE a.EventId = b.EventId AND b.Code = 18 ),
( SELECT b.Value FROM TableB b WHERE a.EventId = b.EventId AND b.Code = 40 )
FROM TableA a JOIN Users u ON a.UserId = u.UserId
WHERE (a.UserId = 3)
ORDER BY u.UserName ASC
SELECT
a.COlA, a.ColB, u.UserName
,MAX(CASE WHEN b.Value = 5 THEN b.value ELSE 0 END) AS V5
,MAX(CASE WHEN b.Value = 15 THEN b.value ELSE 0 END) AS V15
,MAX(CASE WHEN b.Value = 18 THEN b.value ELSE 0 END) AS V18
,MAX(CASE WHEN b.Value = 40 THEN b.value ELSE 0 END) AS V45
,COUNT(CASE WHEN b.Value not IN (5,15,18,40) THEN 1 ELSE NULL END) AS CountVOther
FROM TableA a
INNER JOIN Users u ON a.UserId = u.UserId
LEFT JOIN TableB b ON (a.EventId = b.EventId)
WHERE (a.UserId = 3)
GROUP BY a.colA, a.colB, u.Username
ORDER BY u.UserName ASC