Optimizing SQL query having DISTINCT keyword and functions - sql

I have this query that generates about 40,000 records and the execution time of this query is about 1 minute 30 seconds.
SELECT DISTINCT
a.ID,
a.NAME,
a.DIV,
a.UID,
(select NAME from EMPLOYEE where UID= a.UID and UID<>'') as boss_id,
(select DATE(MAX(create_time)) from XYZ where XYZ_ID= 1 and id = a.ID) as TERM1,
(select DATE(MAX(create_time)) from XYZ where XYZ_ID= 2 and id = a.ID) as TERM2,
(select DATE(MAX(create_time)) from XYZ where XYZ_ID= 3 and id = a.ID) as TERM3,
(select DATE(MAX(create_time)) from XYZ where XYZ_ID= 4 and id = a.ID) as TERM4,
(select DATE(MAX(create_time)) from XYZ where XYZ_ID= 5 and id = a.ID) as TERM5,
(select DATE(MAX(create_time)) from XYZ where XYZ_ID= 6 and id = a.ID) as TERM6,
(select DATE(MAX(create_time)) from XYZ where XYZ_ID= 7 and id = a.ID) as TERM7,
(select DATE(MAX(create_time)) from XYZ where XYZ_ID= 8 and id = a.ID) as TERM8
FROM EMPLOYEE a
WHERE ID LIKE 'D%'
I tried using group by, different kinds of join to improve the execution time but couldn't succeed.Both the tables ABC and XYZ are indexed.
Also, I think that the root cause of this problem is either the DISTINCT keyword or the MAX function.
How can I optimize the above query to bring down the execution time to at least less than a minute?
Any help is appreciated.

Query is not tested, this is just an idea on how you could get this done in two different ways.
(SQL Server solutions here)
Using LEFT JOIN for each ID should look something like this:
SELECT a.ID,
a.NAME,
a.DIV,
a.UID,
b.Name as boss_id,
MAX(xyz1.create_time) as TERM1,
MAX(xyz2.create_time) as TERM2,
MAX(xyz3.create_time) as TERM3,
MAX(xyz4.create_time) as TERM4,
MAX(xyz5.create_time) as TERM5,
MAX(xyz6.create_time) as TERM6,
MAX(xyz7.create_time) as TERM7,
MAX(xyz8.create_time) as TERM8
FROM EMPLOYEE a
JOIN EMPLOYEE b on a.UID = b.UID and b.UID <> ''
LEFT JOIN XYZ xyz1 on a.ID = xyz1.ID and xyz1.XYZ_ID = 1
LEFT JOIN XYZ xyz2 on a.ID = xyz2.ID and xyz1.XYZ_ID = 2
LEFT JOIN XYZ xyz3 on a.ID = xyz3.ID and xyz1.XYZ_ID = 3
LEFT JOIN XYZ xyz4 on a.ID = xyz4.ID and xyz1.XYZ_ID = 4
LEFT JOIN XYZ xyz5 on a.ID = xyz5.ID and xyz1.XYZ_ID = 5
LEFT JOIN XYZ xyz6 on a.ID = xyz6.ID and xyz1.XYZ_ID = 6
LEFT JOIN XYZ xyz7 on a.ID = xyz7.ID and xyz1.XYZ_ID = 7
LEFT JOIN XYZ xyz8 on a.ID = xyz8.ID and xyz1.XYZ_ID = 8
WHERE a.ID LIKE 'D%'
GROUP BY a.ID, a.NAME, a.DIV, a.UID, b.Name
Using PIVOT would look something like this:
select * from (
SELECT DISTINCT
a.ID,
a.NAME,
a.DIV,
a.UID,
b.NAME as boss_id,
xyz.xyz_id,
xyz.create_time
FROM EMPLOYEE a
JOIN EMPLOYEE b on a.UID = b.UID and b.UID <> ''
LEFT JOIN (SELECT DATE(MAX(create_time)) create_time, XYZ_ID, ID
from XYZ
where XYZ_ID between 1 and 8
group by XYZ_ID, ID) xyz on a.ID = xyz1.ID
WHERE a.ID LIKE 'D%') src
PIVOT (
max(create_time) for xyz_id IN (['1'], ['2'], ['3'], ['4'],
['5'], ['6'], ['7'], ['8'])
) PIV
Give it a shot

I would recommend group by and conditional aggregation:
SELECT e.ID, e.NAME, e.DIV, e.UID,
DATE(MAX(CASE WHEN XYZ_ID = 1 THEN create_time END)) as term1,
DATE(MAX(CASE WHEN XYZ_ID = 2 THEN create_time END)) as term2,
DATE(MAX(CASE WHEN XYZ_ID = 3 THEN create_time END)) as term3,
DATE(MAX(CASE WHEN XYZ_ID = 4 THEN create_time END)) as term4,
DATE(MAX(CASE WHEN XYZ_ID = 5 THEN create_time END)) as term5,
DATE(MAX(CASE WHEN XYZ_ID = 6 THEN create_time END)) as term6,
DATE(MAX(CASE WHEN XYZ_ID = 7 THEN create_time END)) as term7,
DATE(MAX(CASE WHEN XYZ_ID = 8 THEN create_time END)) as term8
FROM EMPLOYEE e LEFT JOIN
XYZ
ON xyz.ID = e.id
WHERE e.ID LIKE 'D%'
GROUP BY e.ID, e.NAME, e.DIV, e.UID;
I don't understand the logic for boss_id, so I left that out. This should improve the performance significantly.

Related

Select multiple count(*) in multiple tables with single query

I have 3 tables:
Basic
id
name
description
2
Name1
description2
3
Name2
description3
LinkA
id
linkA_ID
2
344
3
3221
2
6642
3
2312
2
323
LinkB
id
linkB_ID
2
8287
3
42466
2
616422
3
531
2
2555
2
8592
3
1122
2
33345
I want to get results as the table below:
id
name
description
linkA_count
linkB_count
2
Name1
description2
3
2
3
Name2
description3
5
3
my query:
SELECT
a.id
,a.name
,a.description
,COUNT(b.linkA_ID) AS linkA_count
,COUNT(c.linkB_ID) AS linkb_count
FROM
basic a
JOIN linkA b on (a.id = b.id)
JOIN linkb c on (a.id = c.id)
GROUP BY
a.id
,a.name
,a.description
Result from the query is count of linkA always same as linkB
A more traditional approach is to use "derived tables" (subqueries) so that the counts are performed before joins multiply the rows. Using left joins allows for all id's in basic to be returned by the query even if there are no related rows in either joined tables.
select
basic.id
, coalesce(a.LinkACount,0) LinkACount
, coalesce(b.linkBCount,0) linkBCount
from basic
left join (
select id, Count(linkA_ID) LinkACount from LinkA group by id
) as a on a.id=basic.id
left join (
select id, Count(linkB_ID) LinkBCount from LinkB group by id
) as b on b.id=basic.id
Try This (using SubQuery)
SELECT
basic.id
,basic.name
,basic.description
,(select Count(linkA_ID) from LinkA where LinkA.id=basic.id) as LinkACount
,(select Count(linkB_ID) from LinkB where LinkB.id=basic.id) as LinkBCount FROM basic
Method 2 (Try CTE)
with a as(select id,Count(linkA_ID)LinkACount from LinkA group by id)
, b as (select id,Count(linkB_ID)LinkBCount from LinkB group by id)
select basic.id,a.LinkACount,b.linkBCount
from basic
join a on (a.id=basic.id)
join b on (b.id=basic.id)
If you only select from your table you see why your query cannot work.
SELECT
*
FROM
basic a
JOIN linkA b on (a.id = b.id)
JOIN linkb c on (a.id = c.id)
WHERE a.ID = 3
=> just use distinct in your count
SELECT
a.id
,a.name
,a.description
,COUNT(DISTINCT(b.linkA_ID)) AS linkA_count
,COUNT(DISTINCT(c.linkB_ID)) AS linkb_count
FROM
basic a
JOIN linkA b on (a.id = b.id)
JOIN linkb c on (a.id = c.id)
GROUP BY
a.id
,a.name
,a.description

How to select rows by max value from another column in Oracle

I have two datasets in Oracle Table1 and Table2.
When I run this:
SELECT A.ID, B.NUM_X
FROM TABLE1 A
LEFT JOIN TABLE2 B ON A.ID=B.ID
WHERE B.BOOK = 1
It returns this.
ID NUM_X
1 10
1 5
1 9
2 2
2 1
3 20
3 11
What I want are the DISTINCT ID where NUM_X is the MAX value, something like this:
ID NUM_x
1 10
2 2
3 20
You can use aggregation:
SELECT A.ID, MAX(B.NUM_X)
FROM TABLE1 A LEFT JOIN
TABLE2 B
ON A.ID = B.ID
WHERE B.BOOK = 1
GROUP BY A.ID;
If you wanted additional columns, I would recommend window functions:
SELECT A.ID, MAX(B.NUM_X)
FROM TABLE1 A LEFT JOIN
(SELECT B.*,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY NUM_X DESC) as seqnum
FROM TABLE2 B
) B
ON A.ID = B.ID AND B.seqnum = 1
WHERE B.BOOK = 1
GROUP BY A.ID;

Different output when using count and group by

When trying to get a count of IDs I get a different answer when grouping by day vs when I am not.
select cv.CONV_DAY, count(distinct cv.CLICK_ID)
from
clickcache.click cc
right join(
select distinct cv.CLICK_ID, cv.CONV_DAY, cv.PIXEL_ID
from clickcache.CONVERSION cv
where cv.CLICK_ID IS NOT NULL) cv ON cv.CLICK_ID = cc.ID
where cc.ADV_ACCOUNT_ID = 25176
and cv.CONV_DAY between '2016-8-01' AND '2016-08-07'
and AMP_CLICK_STATUS_ID = 1
AND pixel_id IN
(SELECT DISTINCT conversion_pixel_id
FROM
ampx.campaign_event_funnel ef
JOIN ampx.campaign cp ON
cp.id = ef.campaign_id
AND cp.campaign_status_id = 1
WHERE
ef.account_id IN(25176)
AND include_optimization = 1 )
group by 1
order by 1 asc
This yields 170 which is the correct answer and the I want. This, on the other hand, displays 157.
select count(distinct cv.CLICK_ID)
from
clickcache.click cc
right join(
select distinct cv.CLICK_ID, cv.CONV_DAY, cv.PIXEL_ID
from clickcache.CONVERSION cv
where cv.CLICK_ID IS NOT NULL) cv ON cv.CLICK_ID = cc.ID
where cc.ADV_ACCOUNT_ID = 25176
and cv.CONV_DAY between '2016-8-01' AND '2016-08-07'
and AMP_CLICK_STATUS_ID = 1
AND pixel_id IN
(SELECT DISTINCT conversion_pixel_id
FROM
ampx.campaign_event_funnel ef
JOIN ampx.campaign cp ON
cp.id = ef.campaign_id
AND cp.campaign_status_id = 1
WHERE
ef.account_id IN(25176)
AND include_optimization = 1 )
My question is why do I get this discrepancy and how to fix it to get a proper count?
Thank you!
Your count dependents from right query, maybe you have duplicate row?
example
table1
id name value
1 2 3
table2
id name value
1 4 5
2 6 3
1 6 3
right join tables on value get result
select * from table1 a right join table2 b on a.value = b.value
1 2 3 2 6 3
1 2 3 1 6 3
select count(distinct a.value)
from (select a.id, a.name, a.value, b.id, b.name, b.value
from table1 a right join table2 b on a.value = b.value)
result is 1
select b.id, count(distinct a.value)
from (select a.id, a.name, a.value, b.id, b.name, b.value
from table1 a right join table2 b on a.value = b.value group)
group by b.id
result is two rows
2 1
1 1
My guess is that, you have a problem for this reason.

Finding value when multiple rows by group by

Lets say we have the following data sets
tbl_building:
id -- name
1 -- building 1
2 -- building 2
tbl_rooms:
id -- building_id -- room_id -- light_status
1 ------ 1 ------------- 1 ----------- 0
2 ------ 1 ------------- 2 ----------- 1
3 ------ 1 ------------- 3 ----------- 0
4 ------ 2 ------------- 1 ----------- 1
How would I construct a single sql statement to find out which BUILDINGS have a light switched on in a YES/NO format Whilst grouping by Building name
Idealling I want something like the following:
SELECT b.name, if(light_status, 'yes', no) as light_status
FROM tbl_building b
JOIN tbl_rooms r on b.id = r.building_id
group by b.id
However, this seems to be random as to which room it will bring back for each buildinh
Select b.name, case when sum (a.light_status) > 0 then 'YES' else 'NO' end as LightStatus
From tbl_rooms a
Join tbl_buildings b
On a.building_id = b.building_id
Group by b.name
A simple case for a semi-join:
SELECT name
FROM tbl_building b
WHERE EXISTS (
SELECT 1
FROM tbl_rooms r
WHERE b.id = r.building_id
AND light_status = 1
)
This will return those buildings where at least one room has their light switched on.
select (case when tbl_rooms.light_status = 1 then building_id end) as building_id_on,
(case when tbl_rooms.light_status = 0 then building_id end) as building_id_on
from tbl_building inner join tbl_rooms on tbl_building.id = tbl_rooms.building_id
Try this:
SELECT DISTINCT B.name
FROM tbl_rooms A
INNER JOIN tbl_builiding B
ON A.building_id = B.id
To find buildings with at least one room with the lights turned on:
SELECT
B.name
FROM tbl_rooms A
INNER JOIN tbl_builiding B
ON A.building_id = B.id
GROUP BY B.name
HAVING MAX(light_status) = 1
To list all buildings and wether or not they have at least one room with the light turned on:
SELECT
B.name, IIF(MAX(light_status) = 1, 'YES', 'NO') as light_status
FROM tbl_rooms A
INNER JOIN tbl_builiding B
ON A.building_id = B.id
GROUP BY B.name

T-SQL Removing multiple LEFT JOIN

I have such query. It returns ColA and ColB from TableA and UserName from table Users. Then it displays several fields from TableB as additional columns to results. It works but is there any better way than using these multiple LEFT JOINS ?
SELECT a.COlA, a.ColB, u.UserName,
b1.Value,
b2.Value,
b3.Value,
b4.Value,
FROM TableA a JOIN Users u ON a.UserId = u.UserId
LEFT JOIN TableB b1 ON a.EventId = b1.EventId AND b1.Code = 5
LEFT JOIN TableB b2 ON a.EventId = b2.EventId AND b2.Code = 15
LEFT JOIN TableB b3 ON a.EventId = b3.EventId AND b3.Code = 18
LEFT JOIN TableB b4 ON a.EventId = b4.EventId AND b4.Code = 40
WHERE (a.UserId = 3) ORDER BY u.UserName ASC
TableB looks like:
Id | EventId | Code | Value
----------------------------
1 | 1 | 5 | textA
2 | 1 | 15 | textB
3 | 1 | 18 | textC
Sometimes Code is missing but for each event there are no duplicated Codes (so each LEFT JOIN is just another cell in the same result record).
I cannot understand why you want to change something that is working, but here's another way (which does those LEFT joins, but in a different way):
SELECT a.COlA, a.ColB, u.UserName,
( SELECT b.Value FROM TableB b WHERE a.EventId = b.EventId AND b.Code = 5 ),
( SELECT b.Value FROM TableB b WHERE a.EventId = b.EventId AND b.Code = 15 ),
( SELECT b.Value FROM TableB b WHERE a.EventId = b.EventId AND b.Code = 18 ),
( SELECT b.Value FROM TableB b WHERE a.EventId = b.EventId AND b.Code = 40 )
FROM TableA a JOIN Users u ON a.UserId = u.UserId
WHERE (a.UserId = 3)
ORDER BY u.UserName ASC
SELECT
a.COlA, a.ColB, u.UserName
,MAX(CASE WHEN b.Value = 5 THEN b.value ELSE 0 END) AS V5
,MAX(CASE WHEN b.Value = 15 THEN b.value ELSE 0 END) AS V15
,MAX(CASE WHEN b.Value = 18 THEN b.value ELSE 0 END) AS V18
,MAX(CASE WHEN b.Value = 40 THEN b.value ELSE 0 END) AS V45
,COUNT(CASE WHEN b.Value not IN (5,15,18,40) THEN 1 ELSE NULL END) AS CountVOther
FROM TableA a
INNER JOIN Users u ON a.UserId = u.UserId
LEFT JOIN TableB b ON (a.EventId = b.EventId)
WHERE (a.UserId = 3)
GROUP BY a.colA, a.colB, u.Username
ORDER BY u.UserName ASC