SQL split repeating rows caused by UNION - sql

I am writing a query to look through and get two seperate averages based on where conditions.
I tried two select statetments but ended up with lots of duplicates.
Now I have a union which works pretty well, although I have my two fields in alternating rows instead of seperate columns.
Can anyone suggest a fix, sorry for the dodgy code!
SELECT
tblSkillName.skillName,
tblTestScores.skillUID,
AVG(tblTestScores.percentage) AS `cohortPercentage`
FROM
(
(
(
tblTestScores
INNER JOIN tblUsers ON tblUsers.email = tblTestScores.email
)
INNER JOIN tblTestDetails ON tblTestScores.testDetailsID = tblTestDetails.testDetailsID
)
INNER JOIN tblSkillName ON tblSkillName.skillUID = tblTestScores.skillUID
)
WHERE
teacherGroup = '9JS2/Cp'
AND tblTestScores.testDetailsID = 1
GROUP BY
skillName
UNION ALL
SELECT
tblSkillName.skillName,
tblTestScores.skillUID,
AVG(tblTestScores.percentage) AS `groupPercentage`
FROM
(
(
(
tblTestScores
INNER JOIN tblUsers ON tblUsers.email = tblTestScores.email
)
INNER JOIN tblTestDetails ON tblTestScores.testDetailsID = tblTestDetails.testDetailsID
)
INNER JOIN tblSkillName ON tblSkillName.skillUID = tblTestScores.skillUID
)
WHERE
tblTestScores.testDetailsID = 1
GROUP BY
skillName
ORDER BY
skillUID ASC

Related

Multiple Joins less cost way

Below query has 3 tables where I have to do 2 joins to get a column information, It is very slow, is there any effective way to run this query?
SELECT DISTINCT
st.status_c1
FROM
schemaname.tablea st
INNER JOIN (
SELECT
lic.SpecId AS applicationid,
lic.comData AS combusappid,
lic.ageId,
lic.licId,
lic.licid,
lic.appid,
com.nybe_bustbl_id AS busid
FROM
schemaname.tableb lic
INNER JOIN tablec com ON lic.comData = com.comData
WHERE
lic.ageId = '12'
) rt ON
st.ageId = rt.ageId
AND
st.licId = rt.licId
AND
st.licid = rt.licid
AND
st.appid = rt.appid
WHERE
status_id = 3;
Your current query will create extra rows when the JOIN condition is met for multiple entries in either table and then DISTINCT will filter these duplicates out. You could try to cut down the amount of work filtering duplicates by using EXISTS:
SELECT DISTINCT
st.status_c1
FROM schemaname.tablea st
WHERE status_id = 3
AND EXISTS (
SELECT 1
FROM schemaname.tableb lic
WHERE lic.ageId = '12'
AND st.ageId = lic.ageId
AND st.licId = lic.licId
AND st.appid = lic.appid
AND EXISTS(
SELECT 1 FROM tablec com WHERE lic.comData = com.comData
)
);
There is a bunch of redundancy in the query (licid is in the SELECT and ON twice) and you don't need to use subqueries for this. I think this will work:
SELECT DISTINCT st.status_c1
FROM tablea st
INNER JOIN tableb lic ON st.ageId = lic.ageId
AND st.licId = lic.licId
AND st.appid = lic.appid
INNER JOIN tablec com ON lic.comData = com.comData
WHERE status_id = 3
and lic.ageId = '12'
How frequently are you going to run this query, how much time is it taking now and what is the explectation. Are statistcs run on all tha tables.
There are many things which we can think of, but to start with if possible could you plese give ue the like the table structure and explain plan of the query.
Also may be an index on status_c1 table tablea help. As pointed out try removing the join condition which is twice AND st.licid = rt.licid
SELECT DISTINCT st.status_c1
FROM schemaname.tablea st
INNER JOIN (
SELECT
lic.SpecId AS applicationid, lic.comData AS combusappid, lic.ageId, lic.licId, lic.licid,
lic.appid, com.nybe_bustbl_id AS busid
FROM schemaname.tableb lic
INNER JOIN tablec com ON lic.comData = com.comData
WHERE lic.ageId = '12'
) rt ON st.ageId = rt.ageId AND st.licId = rt.licId AND st.licid = rt.licid AND st.appid = rt.appid
WHERE status_id = 3;

SELECT NOT IN with multiple columns in subquery

Regarding the statement below, sltrxid can exist as both ardoccrid and ardocdbid. I'm wanting to know how to include both in the NOT IN subquery.
SELECT *
FROM glsltransaction A
INNER JOIN cocustomer B ON A.acctid = B.customerid
WHERE sltrxstate = 4
AND araccttype = 1
AND sltrxid NOT IN(
SELECT ardoccrid,ardocdbid
FROM arapplyitem)
I would recommend not exists:
SELECT *
FROM glsltransaction t
INNER JOIN cocustomer c ON c.customerid = t.acctid
WHERE
??.sltrxstate = 4
AND ??.araccttype = 1
AND NOT EXISTS (
SELECT 1
FROM arapplyitem a
WHERE ??.sltrxid IN (a.ardoccrid, a.ardocdbid)
)
Note that I changed the table aliases to things that are more meaningful. I would strongly recommend prefixing the column names with the table they belong to, so the query is unambiguous - in absence of any indication, I represented this as ?? in the query.
IN sometimes optimize poorly. There are situations where two subqueries are more efficient:
SELECT *
FROM glsltransaction t
INNER JOIN cocustomer c ON c.customerid = t.acctid
WHERE
??.sltrxstate = 4
AND ??.araccttype = 1
AND NOT EXISTS (
SELECT 1
FROM arapplyitem a
WHERE ??.sltrxid = a.ardoccrid
)
AND NOT EXISTS (
SELECT 1
FROM arapplyitem a
WHERE ??.sltrxid = a.ardocdbid
)

Existing query optimization

We have 5 tables and we are trying to create a view to get the results.
Below is the view which is working fine.
I need suggestions. Is it a good practice to write this query in this way or it can be optimized in a better way.
SELECT p.Pid, hc.hcid, hc.Accomodation, ghc.ghcid, ghc.ProductFeatures, wp.existing, wp.acute, mc.cardiaccover, mc.cardiaclimitationperiod
FROM TableA p
LEFT JOIN TableB hc
ON p.pid = hc.pid
LEFT JOIN TableC ghc
ON p.pid = ghc.pid
LEFT JOIN (SELECT *
FROM (SELECT hcid,
title,
wperiodvalue + '-' + CASE WHEN
wperiodvalue > 1 THEN
unit +
's' ELSE
unit END wperiod
FROM TableD) d
PIVOT ( Max(wperiod)
FOR title IN (acute,
existing
) ) piv1) wp
ON hc.hcid = wp.hcid
LEFT JOIN (SELECT *
FROM (SELECT hcid,
title + col new_col,
value
FROM TableE
CROSS apply ( VALUES (cover,
'Cover'),
(Cast(limitationperiod AS
VARCHAR
(10)),
'LimitationPeriod') ) x (value, col
)) d
PIVOT ( Max(value)
FOR new_col IN (cardiaccover,
cardiaclimitationperiod,
cataracteyelenscover,
cataracteyelenslimitationperiod
) ) piv2) mc
ON hc.hcid = mc.hcid
Any suggestions would be appreciated.
Thanks
My suggestion is to break down the query using temporary table, create stored procedure then dump data in the one new table and with the help of that table you can create view:
Store both PIVOT result in tow seperate temp tables as
SELECT * INTO #pvtInfo FROM ( --first PIVOT query
SELECT * INTO #pvtInfoTwo FROM ( --second PIVOT query
Then your final query will be as :
SELECT p.Pid,
hc.hcid,
hc.Accomodation,
ghc.ghcid,
ghc.ProductFeatures,
wp.existing,
wp.acute,
mc.cardiaccover,
mc.cardiaclimitationperiod
FROM TableA p
LEFT JOIN TableB hc ON p.pid = hc.pid
LEFT JOIN TableC ghc ON p.pid = ghc.pid
LEFT JOIN #pvtInfo wp ON hc.hcid = wp.hcid
LEFT JOIN #pvtInfoTwo mc ON hc.hcid = mc.hcid
First you can try then only go with SP and VIEW.
Hope, It will help.

How can I join on multiple columns within the same table that contain the same type of info?

I am currently joining two tables based on Claim_Number and Customer_Number.
SELECT
A.*,
B.*,
FROM Company.dbo.Company_Master AS A
LEFT JOIN Company.dbp.Compound_Info AS B ON A.Claim_Number = B.Claim_Number AND A.Customer_Number = B.Customer_Number
WHERE A.Filled_YearMonth = '201312' AND A.Compound_Ind = 'Y'
This returns exactly the data I'm looking for. The problem is that I now need to join to another table to get information based on a Product_ID. This would be easy if there was only one Product_ID in the Compound_Info table for each record. However, there are 10. So basically I need to SELECT 10 additional columns for Product_Name based on each of those Product_ID's that are being selected already. How can do that? This is what I was thinking in my head, but is not working right.
SELECT
A.*,
B.*,
PD_Info_1.Product_Name,
PD_Info_2.Product_Name,
....etc {Up to 10 Product Names}
FROM Company.dbo.Company_Master AS A
LEFT JOIN Company.dbo.Compound_Info AS B ON A.Claim_Number = B.Claim_Number AND A.Customer_Number = B.Customer_Number
LEFT JOIN Company.dbo.Product_Info AS PD_Info_1 ON B.Product_ID_1 = PD_Info_1.Product_ID
LEFT JOIN Company.dbo.Product_Info AS PD_Info_2 ON B.Product_ID_2 = PD_Info_2.Product_ID
.... {Up to 10 LEFT JOIN's}
WHERE A.Filled_YearMonth = '201312' AND A.Compound_Ind = 'Y'
This query not only doesn't return the correct results, it also takes forever to run. My actual SQL is a lot longer and I've changed table names, etc but I hope that you can get the idea. If it matters, I will be creating a view based on this query.
Please advise on how to select multiple columns from the same table correctly and efficiently. Thanks!
I found put my extra stuff into CTE and add ROW_NUMBER to insure that I get only 1 row that I care about. it would look something like this. I only did for first 2 product info.
WITH PD_Info
AS ( SELECT Product_ID
,Product_Name
,Effective_Date
,ROW_NUMBER() OVER ( PARTITION BY Product_ID, Product_Name ORDER BY Effective_Date DESC ) AS RowNum
FROM Company.dbo.Product_Info)
SELECT A.*
,B.*
,PD_Info_1.Product_Name
,PD_Info_2.Product_Name
FROM Company.dbo.Company_Master AS A
LEFT JOIN Company.dbo.Compound_Info AS B
ON A.Claim_Number = B.Claim_Number
AND A.Customer_Number = B.Customer_Number
LEFT JOIN PD_Info AS PD_Info_1
ON B.Product_ID_1 = PD_Info_1.Product_ID
AND B.Fill_Date >= PD_Info_1.Effective_Date
AND PD_Info_2.RowNum = 1
LEFT JOIN PD_Info AS PD_Info_2
ON B.Product_ID_2 = PD_Info_2.Product_ID
AND B.Fill_Date >= PD_Info_2.Effective_Date
AND PD_Info_2.RowNum = 1

Multiple MAX values select using inner join

I have query that work for me only when values in the StakeValue don't repeat.
Basically, I need to select maximum values from SI_STAKES table with their relations from two other tables grouped by internal type.
SELECT a.StakeValue, b.[StakeName], c.[ProviderName]
FROM SI_STAKES AS a
INNER JOIN SI_STAKESTYPES AS b ON a.[StakeTypeID] = b.[ID]
INNER JOIN SI_PROVIDERS AS c ON a.[ProviderID] = c.[ID] WHERE a.[EventID]=6
AND a.[StakeGroupTypeID]=1
AND a.StakeValue IN
(SELECT MAX(d.StakeValue) FROM SI_STAKES AS d
WHERE d.[EventID]=a.[EventID] AND d.[StakeGroupTypeID]=a.[StakeGroupTypeID]
GROUP BY d.[StakeTypeID])
ORDER BY b.[StakeName], a.[StakeValue] DESC
Results for example must be:
[ID] [MaxValue] [StakeTypeID] [ProviderName]
1 1,5 6 provider1
2 3,75 7 provider2
3 7,6 8 provider3
Thank you for your help
There are two problems to solve here.
1) Finding the max values per type. This will get the Max value per StakeType and make sure that we do the exercise only for the wanted events and group type.
SELECT StakeGroupTypeID, EventID, StakeTypeID, MAX(StakeValue) AS MaxStakeValue
FROM SI_STAKES
WHERE Stake.[EventID]=6
AND Stake.[StakeGroupTypeID]=1
GROUP BY StakeGroupTypeID, EventID, StakeTypeID
2) Then we need to get only one return back for that value since it may be present more then once.
Using the Max Value, we must find a unique row for each I usually do this by getting the Max ID is has the added advantage of getting me the most recent entry.
SELECT MAX(SMaxID.ID) AS ID
FROM SI_STAKES AS SMaxID
INNER JOIN (
SELECT StakeGroupTypeID, EventID, StakeTypeID, MAX(StakeValue) AS MaxStakeValue
FROM SI_STAKES
WHERE Stake.[EventID]=6
AND Stake.[StakeGroupTypeID]=1
GROUP BY StakeGroupTypeID, EventID, StakeTypeID
) AS SMaxVal ON SMaxID.StakeTypeID = SMaxVal.StakeTypeID
AND SMaxID.StakeValue = SMaxVal.MaxStakeValue
AND SMaxID.EventID = SMaxVal.EventID
AND SMaxID.StakeGroupTypeID = SMaxVal.StakeGroupTypeID
3) Now that we have the ID's of the rows that we want, we can just get that information.
SELECT Stakes.ID, Stakes.StakeValue, SType.StakeName, SProv.ProviderName
FROM SI_STAKES AS Stakes
INNER JOIN SI_STAKESTYPES AS SType ON Stake.[StakeTypeID] = SType.[ID]
INNER JOIN SI_PROVIDERS AS SProv ON Stake.[ProviderID] = SProv.[ID]
WHERE Stake.ID IN (
SELECT MAX(SMaxID.ID) AS ID
FROM SI_STAKES AS SMaxID
INNER JOIN (
SELECT StakeGroupTypeID, EventID, StakeTypeID, MAX(StakeValue) AS MaxStakeValue
FROM SI_STAKES
WHERE Stake.[EventID]=6
AND Stake.[StakeGroupTypeID]=1
GROUP BY StakeGroupTypeID, EventID, StakeTypeID
) AS SMaxVal ON SMaxID.StakeTypeID = SMaxVal.StakeTypeID
AND SMaxID.StakeValue = SMaxVal.MaxStakeValue
AND SMaxID.EventID = SMaxVal.EventID
AND SMaxID.StakeGroupTypeID = SMaxVal.StakeGroupTypeID
)
You can use the over clause since you're using T-SQL (hopefully 2005+):
select distinct
a.stakevalue,
max(a.stakevalue) over (partition by a.staketypeid) as maxvalue,
b.staketypeid,
c.providername
from
si_stakes a
inner join si_stakestypes b on
a.staketypeid = b.id
inner join si_providers c on
a.providerid = c.id
where
a.eventid = 6
and a.stakegrouptypeid = 1
Essentially, this will find the max a.stakevalue for each a.staketypeid. Using a distinct will return one and only one row. Now, if you wanted to include the min a.id along with it, you could use row_number to accomplish this:
select
s.id,
s.maxvalue,
s.staketypeid,
s.providername
from (
select
row_number() over (order by a.stakevalue desc
partition by a.staketypeid) as rownum,
a.id,
a.stakevalue as maxvalue,
b.staketypeid,
c.providername
from
si_stakes a
inner join si_stakestypes b on
a.staketypeid = b.id
inner join si_providers c on
a.providerid = c.id
where
a.eventid = 6
and a.stakegrouptypeid = 1
) s
where
s.rownum = 1