SQL Server - UNION ALL - sql

I'm new to SQL development and I need to do UNION on two select statements. Below is a sample query. The Join tables & conditions, where criteria, columns names and everything is the same in both the select statements except the the primary tables after the FROM clause. I just wanted to know if there is a way to have a single static select query, instead of repeating the same query twice for the UNION (without going for a dynamic query).
SELECT Sum(ABC.Intakes) As TotalIntakes, Sum(ABC.ClientTarget) as TotalClientTarget
FROM(
SELECT Sum(tt.IntakesReceived) As Intakes, Sum(tt.ClientTarget) As ClientTarget,
tt.ProgramId
FROM
(SELECT Count(DISTINCT ClientID) As IntakesReceived,
DATEDIFF(MONTH, L.AwardStartDate, L.AwardEndDate)*L.MonthlyClientTarget As ClientTarget,
L.AwardId, L.ProgramId
FROM IntakeCoverageLegacy As L
LEFT JOIN UserRoleEntity URE ON URE.EntityId = L.AwardId
LEFT JOIN CDPUserRole UR ON URE.UserRoleId = UR.Id AND UR.CDPUserId = #UserId
WHERE (#Program IS NULL OR L.ProgramId IN (SELECT ProgramID FROM #ProgramIDList)
AND (ufn_IsInternalUser(#UserId) = 1
OR (ufn_IsInternalUser(#UserId) = 0 AND UR.CDPUserId = #UserId ))
GROUP BY L.AwardId, L.ProgramId) As tt
GROUP BY tt.ProgramId, tt.ProgramName
UNION ALL
SELECT Sum(tt.IntakesReceived) As Intakes, Sum(tt.ClientTarget) As ClientTarget,
tt.ProgramId
FROM
(SELECT Count(DISTINCT C.ClientID) As IntakesReceived,
DATEDIFF(MONTH, C.AwardStartDate, C.AwardEndDate)*L.MonthlyClientTarget As ClientTarget,
C.AwardId, C.ProgramId
FROM IntakeCoverageCDP As C
LEFT JOIN UserRoleEntity URE ON URE.EntityId = L.AwardId
LEFT JOIN CDPUserRole UR ON URE.UserRoleId = UR.Id AND UR.CDPUserId = #UserId
WHERE (#Program IS NULL OR C.ProgramId IN (SELECT ProgramID FROM #ProgramIDList)
AND (ufn_IsInternalUser(#UserId) = 1
OR (ufn_IsInternalUser(#UserId) = 0 AND UR.CDPUserId = #UserId ))
GROUP BY C.AwardId, C.ProgramId) As tt
GROUP BY tt.ProgramId, tt.ProgramName
) As ABC
GROUP BY ABC.ProgramId
OK... What I posted earlier was a sample query and I've updated the sample to my actual query to make it more clear. It's just the primary tables that are different. My requirement is that - after doing UNION ALL, I need to sum the aggregate columns in the final result, grouping by ProgramId.

I would probably first use UNION for the Client and LegacyClient tables as a derived table and then perform the JOINs:
SELECT C.AwardId,
C.ProgramName,
COUNT(ClientId) AS Intakes
FROM ( SELECT AwardId,
ProgramName,
Id
FROM Client
WHERE Id = #ClientId
UNION
SELECT AwardId,
ProgramName,
Id
FROM LegacyClient
WHERE Id = #ClientId) C
LEFT JOIN UserRoleEntity URE
ON C.AwardId = URE.EntityId
LEFT JOIN UserRole UR
ON URE.UserRoleId = UR.Id AND UR.CDPUserId = #UserId
WHERE (testFunction(#UserId) = 0
OR (testFunction(#UserId) <> 0 AND UR.CDPUserId = #UserId))
GROUP BY C.AwardId,
C.ProgramName;

SELECT C.AwardId, C.ProgramName, Count(ClientId) as Intakes
FROM
(
SELECT Id, AwardId, ProgramName, ClientId FROM Client UNION ALL
SELECT Id, AwardId, ProgramName, ClientId FROM LegacyClient
) C
LEFT OUTER JOIN UserRoleEntity URE ON C.AwardId = URE.EntityId
LEFT OUTER JOIN UserRole UR ON URE.UserRoleId = UR.Id AND UR.CDPUserId = #UserId
WHERE
C.Id = #ClientId
AND (testFunction(#UserId) = 0 OR UR.CDPUserId = #UserId)
GROUP BY C.AwardId, C.ProgramName
Using testFunction() twice isn't really necessary (unless null is one of the outputs.)
You might also prefer to filter on ClientId outside of the union. I'm guess your purpose in rewriting it to avoid the duplicated logic. You might still want to see which one is better handled by the optimizer.
Also, I used a UNION ALL. I'm thinking you imagine only one result from one of the two tables. As you originally wrote it that count column is going to factor into the union.
Counting on ClientId seems odd. So does having a parameter named #ClientId that doesn't seem to match up with the ClientId column.

Related

SQL split repeating rows caused by UNION

I am writing a query to look through and get two seperate averages based on where conditions.
I tried two select statetments but ended up with lots of duplicates.
Now I have a union which works pretty well, although I have my two fields in alternating rows instead of seperate columns.
Can anyone suggest a fix, sorry for the dodgy code!
SELECT
tblSkillName.skillName,
tblTestScores.skillUID,
AVG(tblTestScores.percentage) AS `cohortPercentage`
FROM
(
(
(
tblTestScores
INNER JOIN tblUsers ON tblUsers.email = tblTestScores.email
)
INNER JOIN tblTestDetails ON tblTestScores.testDetailsID = tblTestDetails.testDetailsID
)
INNER JOIN tblSkillName ON tblSkillName.skillUID = tblTestScores.skillUID
)
WHERE
teacherGroup = '9JS2/Cp'
AND tblTestScores.testDetailsID = 1
GROUP BY
skillName
UNION ALL
SELECT
tblSkillName.skillName,
tblTestScores.skillUID,
AVG(tblTestScores.percentage) AS `groupPercentage`
FROM
(
(
(
tblTestScores
INNER JOIN tblUsers ON tblUsers.email = tblTestScores.email
)
INNER JOIN tblTestDetails ON tblTestScores.testDetailsID = tblTestDetails.testDetailsID
)
INNER JOIN tblSkillName ON tblSkillName.skillUID = tblTestScores.skillUID
)
WHERE
tblTestScores.testDetailsID = 1
GROUP BY
skillName
ORDER BY
skillUID ASC

Multiple Joins less cost way

Below query has 3 tables where I have to do 2 joins to get a column information, It is very slow, is there any effective way to run this query?
SELECT DISTINCT
st.status_c1
FROM
schemaname.tablea st
INNER JOIN (
SELECT
lic.SpecId AS applicationid,
lic.comData AS combusappid,
lic.ageId,
lic.licId,
lic.licid,
lic.appid,
com.nybe_bustbl_id AS busid
FROM
schemaname.tableb lic
INNER JOIN tablec com ON lic.comData = com.comData
WHERE
lic.ageId = '12'
) rt ON
st.ageId = rt.ageId
AND
st.licId = rt.licId
AND
st.licid = rt.licid
AND
st.appid = rt.appid
WHERE
status_id = 3;
Your current query will create extra rows when the JOIN condition is met for multiple entries in either table and then DISTINCT will filter these duplicates out. You could try to cut down the amount of work filtering duplicates by using EXISTS:
SELECT DISTINCT
st.status_c1
FROM schemaname.tablea st
WHERE status_id = 3
AND EXISTS (
SELECT 1
FROM schemaname.tableb lic
WHERE lic.ageId = '12'
AND st.ageId = lic.ageId
AND st.licId = lic.licId
AND st.appid = lic.appid
AND EXISTS(
SELECT 1 FROM tablec com WHERE lic.comData = com.comData
)
);
There is a bunch of redundancy in the query (licid is in the SELECT and ON twice) and you don't need to use subqueries for this. I think this will work:
SELECT DISTINCT st.status_c1
FROM tablea st
INNER JOIN tableb lic ON st.ageId = lic.ageId
AND st.licId = lic.licId
AND st.appid = lic.appid
INNER JOIN tablec com ON lic.comData = com.comData
WHERE status_id = 3
and lic.ageId = '12'
How frequently are you going to run this query, how much time is it taking now and what is the explectation. Are statistcs run on all tha tables.
There are many things which we can think of, but to start with if possible could you plese give ue the like the table structure and explain plan of the query.
Also may be an index on status_c1 table tablea help. As pointed out try removing the join condition which is twice AND st.licid = rt.licid
SELECT DISTINCT st.status_c1
FROM schemaname.tablea st
INNER JOIN (
SELECT
lic.SpecId AS applicationid, lic.comData AS combusappid, lic.ageId, lic.licId, lic.licid,
lic.appid, com.nybe_bustbl_id AS busid
FROM schemaname.tableb lic
INNER JOIN tablec com ON lic.comData = com.comData
WHERE lic.ageId = '12'
) rt ON st.ageId = rt.ageId AND st.licId = rt.licId AND st.licid = rt.licid AND st.appid = rt.appid
WHERE status_id = 3;

How do you properly query the result of a complex join statement in SQL?

New to advanced SQL!
I'm trying to write a query that returns the COUNT(*) and SUM of the resulting columns from this query:
DECLARE #Id INT = 1000;
SELECT
*,
CASE
WHEN Id1 >= 6 THEN 1
ELSE 0
END AS Tier1,
CASE
WHEN Id1 >= 4 THEN 1
ELSE 0
END AS Tier2,
CASE
WHEN Id1 >= 2 THEN 1
ELSE 0
END AS Tier3
FROM (
SELECT
Org.OrgID,
App.AppID,
App.FirstName,
App.LastName,
MAX(AppSubmitU_Level.Id1) AS Id1
FROM Org
INNER JOIN AppEmployment
ON AppEmployment.OrgID = Org.OrgID
INNER JOIN App
ON App.AppID = AppEmployment.AppID
INNER JOIN AppSubmit
ON App.AppID = AppSubmit.AppID
INNER JOIN AppSubmitU_Level
ON AppSubmit.LevelID = AppSubmitU_Level.Id1
INNER JOIN AppEmpU_VerifyStatus
ON AppEmpU_VerifyStatus.VerifyStatusID = AppEmployment.VerifyStatusID
WHERE AppSubmitU_Level.SubmitTypeID = 1 -- Career
AND AppEmpU_VerifyStatus.StatusIsVerified = 1
AND AppSubmit.[ExpireDate] IS NOT NULL
AND AppSubmit.[ExpireDate] > GETDATE()
AND Org.OrgID = #Id
GROUP BY
Org.OrgID,
App.AppID,
App.FirstName,
App.LastName
) employees
I've tried to do so by moving the #Id outside the original query, and adding a SELECT(*), SUM, and SUM to the top, like so:
DECLARE #OrgID INT = 1000;
SELECT COUNT(*), SUM(employees.Tier1), SUM(employees.Tier2), SUM(employees.Tier3)
FROM
(SELECT *,
...
) AS employees
);
When I run the query, however, I'm getting the errors:
The multi-part identifier employees.Tier1 could not be bound
The same errors appear for the other identifiers in my SUM statements.
I'm assuming this has to do with the fact that the Tier1, Tier2, and Tier3 columns are being returned by the inner join query in my FROM(), and aren't values set by the existing tables that I'm querying. But I can't figure out how to rewrite it to initialize properly.
Thanks in advance for the help!
This is a scope problem: employees is defined in the subquery only, it is not available in the outer scope. You basically want to alias the outer query:
DECLARE #OrgID INT = 1000;
SELECT COUNT(*), SUM(employees.Tier1) TotalTier1, SUM(employees.Tier2) TotalTier2, SUM(employees.Tier3) TotalTier3
FROM (
SELECT *,
...
) AS employees
) AS employees;
--^ here
Note that I added column aliases to the outer query, which is a good practice in SQL.
It might be easier to understand what is going on if you use another alias for the outer query:
SELECT COUNT(*), SUM(e.Tier1), SUM(e.Tier2), SUM(e.Tier3)
FROM (
SELECT *,
...
) AS employees
) AS e;
Note that you don't actually need to qualify the column names in the outer query, since column names are unambigous anyway.
And finally: you don't actually need a subquery. You could write the query as:
SELECT
SUM(CASE WHEN Id1 >= 6 THEN 1 ELSE 0 END) AS TotalTier1,
SUM(CASE WHEN Id1 >= 4 THEN 1 ELSE 0 END) AS TotalTier2,
SUM(CASE WHEN Id1 >= 2 THEN 1 ELSE 0 END) AS TotalTier3
FROM (
SELECT
Org.OrgID,
App.AppID,
App.FirstName,
App.LastName,
MAX(AppSubmitU_Level.Id1) AS Id1
FROM Org
INNER JOIN AppEmployment
ON AppEmployment.OrgID = Org.OrgID
INNER JOIN App
ON App.AppID = AppEmployment.AppID
INNER JOIN AppSubmit
ON App.AppID = AppSubmit.AppID
INNER JOIN AppSubmitU_Level
ON AppSubmit.LevelID = AppSubmitU_Level.Id1
INNER JOIN AppEmpU_VerifyStatus
ON AppEmpU_VerifyStatus.VerifyStatusID = AppEmployment.VerifyStatusID
WHERE AppSubmitU_Level.SubmitTypeID = 1 -- Career
AND AppEmpU_VerifyStatus.StatusIsVerified = 1
AND AppSubmit.[ExpireDate] IS NOT NULL
AND AppSubmit.[ExpireDate] > GETDATE()
AND Org.OrgID = #Id
GROUP BY
Org.OrgID,
App.AppID,
App.FirstName,
App.LastName
) employees

Using the COUNT with multiple tables

I want to count the number of different sheep, and I want it in one table.
Like this;
Ewes | Rams | Lambs
8 | 5 | 12
The query I try is this, but it doesn't work;
SELECT COUNT(e.EweID) AS 'Ewe', COUNT(r.RamID) AS 'Ram', COUNT(l.LambID) AS 'Lamb'
FROM Sheep s
INNER JOIN Ewe e ON s.SheepID = e.EweID
INNER JOIN Ram r ON s.SheepID = r.RamID
INNER JOIN Lamb l ON s.SheepID = l.LambID
WHERE s.FarmerID = '123'
I don't get what I'm doing wrong, this is my database ERD;
I don't think you need a FROM here at all:
select
(select count(*) from Ram where Famerid = 123) as RamCount,
(select count(*) from Ewe where Famerid = 123) as Count,
(select count(*) from Lamb where Famerid = 123) as LambCount
(There is no relationship between the rows you are counting, do don't try and create one. Instead count each separately, wrapping it all in an outer select keeps everything in a single result row.)
I think that the problem here is that you don't need an INNER JOIN but an OUTER JOIN ...
SELECT COUNT(CASE WHEN e.EweID IN NOT NULL THEN e.EweID ELSE 0 END) AS 'Ewe', COUNT(r.RamID) AS 'Ram', COUNT(l.LambID) AS 'Lamb'
FROM Sheep s
LEFT OUTER JOIN Ewe e ON s.SheepID = e.EweID
LEFT OUTER JOIN Ram r ON s.SheepID = r.RamID
LEFT OUTER JOIN Lamb l ON s.SheepID = l.LambID
WHERE s.FarmerID = '123'
Take a look even at the case statement that I've added inside the first count(Ewe), to see a way to handle nulls in the count .
The Left Outer Join logical operator returns each row that satisfies
the join of the first (top) input with the second (bottom) input. It
also returns any rows from the first input that had no matching rows
in the second input. The nonmatching rows in the second input are
returned as null values. If no join predicate exists in the Argument
column, each row is a matching row.
Use correlated sub-selects to do the counting:
SELECT (select COUNT(*) from Ewe e where s.SheepID = e.EweID) AS 'Ewe',
(select COUNT(*) from Ram r where s.SheepID = r.RamID) AS 'Ram',
(select COUNT(*) from Lamb l where s.SheepID = l.LambID) AS 'Lamb'
FROM Sheep s
WHERE s.FarmerID = '123'
And you can also simply remove the WHERE clause to get all farms' counts.
DECLARE #Count1 INT;
SELECT #Count1 = COUNT(*)
FROM dbo.Ewe;
DECLARE #Count2 INT;
SELECT #Count2 = COUNT(*)
FROM dbo.Ram;
DECLARE #Count3 INT;
SELECT #Count3 = COUNT(*)
FROM dbo.Lamb;
SELECT #Count1 AS 'Ewe' ,
#Count2 AS 'Ram' ,
#Count3 AS 'Lamb'

Multiple MAX values select using inner join

I have query that work for me only when values in the StakeValue don't repeat.
Basically, I need to select maximum values from SI_STAKES table with their relations from two other tables grouped by internal type.
SELECT a.StakeValue, b.[StakeName], c.[ProviderName]
FROM SI_STAKES AS a
INNER JOIN SI_STAKESTYPES AS b ON a.[StakeTypeID] = b.[ID]
INNER JOIN SI_PROVIDERS AS c ON a.[ProviderID] = c.[ID] WHERE a.[EventID]=6
AND a.[StakeGroupTypeID]=1
AND a.StakeValue IN
(SELECT MAX(d.StakeValue) FROM SI_STAKES AS d
WHERE d.[EventID]=a.[EventID] AND d.[StakeGroupTypeID]=a.[StakeGroupTypeID]
GROUP BY d.[StakeTypeID])
ORDER BY b.[StakeName], a.[StakeValue] DESC
Results for example must be:
[ID] [MaxValue] [StakeTypeID] [ProviderName]
1 1,5 6 provider1
2 3,75 7 provider2
3 7,6 8 provider3
Thank you for your help
There are two problems to solve here.
1) Finding the max values per type. This will get the Max value per StakeType and make sure that we do the exercise only for the wanted events and group type.
SELECT StakeGroupTypeID, EventID, StakeTypeID, MAX(StakeValue) AS MaxStakeValue
FROM SI_STAKES
WHERE Stake.[EventID]=6
AND Stake.[StakeGroupTypeID]=1
GROUP BY StakeGroupTypeID, EventID, StakeTypeID
2) Then we need to get only one return back for that value since it may be present more then once.
Using the Max Value, we must find a unique row for each I usually do this by getting the Max ID is has the added advantage of getting me the most recent entry.
SELECT MAX(SMaxID.ID) AS ID
FROM SI_STAKES AS SMaxID
INNER JOIN (
SELECT StakeGroupTypeID, EventID, StakeTypeID, MAX(StakeValue) AS MaxStakeValue
FROM SI_STAKES
WHERE Stake.[EventID]=6
AND Stake.[StakeGroupTypeID]=1
GROUP BY StakeGroupTypeID, EventID, StakeTypeID
) AS SMaxVal ON SMaxID.StakeTypeID = SMaxVal.StakeTypeID
AND SMaxID.StakeValue = SMaxVal.MaxStakeValue
AND SMaxID.EventID = SMaxVal.EventID
AND SMaxID.StakeGroupTypeID = SMaxVal.StakeGroupTypeID
3) Now that we have the ID's of the rows that we want, we can just get that information.
SELECT Stakes.ID, Stakes.StakeValue, SType.StakeName, SProv.ProviderName
FROM SI_STAKES AS Stakes
INNER JOIN SI_STAKESTYPES AS SType ON Stake.[StakeTypeID] = SType.[ID]
INNER JOIN SI_PROVIDERS AS SProv ON Stake.[ProviderID] = SProv.[ID]
WHERE Stake.ID IN (
SELECT MAX(SMaxID.ID) AS ID
FROM SI_STAKES AS SMaxID
INNER JOIN (
SELECT StakeGroupTypeID, EventID, StakeTypeID, MAX(StakeValue) AS MaxStakeValue
FROM SI_STAKES
WHERE Stake.[EventID]=6
AND Stake.[StakeGroupTypeID]=1
GROUP BY StakeGroupTypeID, EventID, StakeTypeID
) AS SMaxVal ON SMaxID.StakeTypeID = SMaxVal.StakeTypeID
AND SMaxID.StakeValue = SMaxVal.MaxStakeValue
AND SMaxID.EventID = SMaxVal.EventID
AND SMaxID.StakeGroupTypeID = SMaxVal.StakeGroupTypeID
)
You can use the over clause since you're using T-SQL (hopefully 2005+):
select distinct
a.stakevalue,
max(a.stakevalue) over (partition by a.staketypeid) as maxvalue,
b.staketypeid,
c.providername
from
si_stakes a
inner join si_stakestypes b on
a.staketypeid = b.id
inner join si_providers c on
a.providerid = c.id
where
a.eventid = 6
and a.stakegrouptypeid = 1
Essentially, this will find the max a.stakevalue for each a.staketypeid. Using a distinct will return one and only one row. Now, if you wanted to include the min a.id along with it, you could use row_number to accomplish this:
select
s.id,
s.maxvalue,
s.staketypeid,
s.providername
from (
select
row_number() over (order by a.stakevalue desc
partition by a.staketypeid) as rownum,
a.id,
a.stakevalue as maxvalue,
b.staketypeid,
c.providername
from
si_stakes a
inner join si_stakestypes b on
a.staketypeid = b.id
inner join si_providers c on
a.providerid = c.id
where
a.eventid = 6
and a.stakegrouptypeid = 1
) s
where
s.rownum = 1