Complex LEFT JOIN not working as expected

Complex LEFT JOIN not working as expected - sql

DBMS is intersystems-cache!
Here is my full query:
SELECT m.Name AS MessageType, COUNT(l.name) AS MessageCount, CAST(AVG(ResponseTime) AS DECIMAL(5, 2)) AS AvgResponseTime
FROM
(SELECT DISTINCT(name) FROM ENSLIB_HL7.Message) m LEFT JOIN
(
SELECT CAST(li.SessionId AS Bigint) AS session_id, li.name, MIN(li.TimeCreated) AS SessionStart, MAX(lo.TimeCreated) AS SessionEnd, CAST(DATEDIFF(s, MIN(li.TimeCreated), MAX(lo.TimeCreated)) AS DECIMAL(5, 2)) AS ResponseTime
FROM (SELECT h1.SessionId, h1.TimeCreated, $PIECE(RawContent, '|', 4), m1.name FROM ens.messageheader h1, ENSLIB_HL7.Message m1 WHERE h1.MessageBodyId = m1.id AND h1.TimeCreated > DATEADD(mi, -30, GETUTCDATE())) li
JOIN (SELECT h2.SessionId, h2.TimeCreated FROM ens.messageheader h2, ENSLIB_HL7.Message m2 WHERE h2.MessageBodyId = m2.id AND h2.TimeCreated > DATEADD(mi, -30, GETUTCDATE())) lo
ON li.SessionId = lo.SessionId
GROUP BY li.SessionId
) l on m.name = l.name
GROUP BY l.Name
This gives me 4 results:
VXU_V04 0 (null)
ADT_A03 3 0.01
ADT_A04 3 0.01
ADT_A08 143 0.01
Given that there is one result with 0 records, it seems like it is working. However, if I run SELECT DISTINCT(name) FROM ENSLIB_HL7.Message I get 10 results:
VXU_V04
ADT_A08
ACK_A08
ADT_A03
ACK_A03
ADT_A04
ACK_A04
ACK_V04
ADT_A01
ACK_A01
Why don't I get ten rows with my full query?

Change the GROUP BY to:
GROUP BY m.Name
You are aggregating by the column in the second table, so you only get one row for all the NULL values.
Most databases would reject this syntax, but apparently Intersystems Cache allows it.

Related

Grouping the result set based on conditions

I am calculating Age of a user based on his date of birth.
select UserId, (Convert(int,Convert(char(8),GETDATE(),112))-Convert(char(8),[DateOfBirth],112))/10000 AS [Age] FROM dbo.[User]
This gives me the UserId and his age.
Now I want to group this result.
How many users are in 30's, How many users in 40's and how many users in their 50's.. need the count of users with their age groups
If the user is > 0 and less than 30 he should be grouped to 20's
If the user is >= 30 and < 40 then he should be added to 30's list, same with 40's and 50's
Can this be achieved without creating any temp table?

I believe this will get you what you want.
Anything < 30 will be placed in the '20' group.
Anything >= 50 will be placed in the '50' group.
If they are 30-39 or 40-49, they will be placed in their appropriate decade group.
SELECT y.AgeDecade, COUNT(*)
FROM dbo.[User] u
CROSS APPLY (SELECT Age = (CONVERT(int, CONVERT(char(8), GETDATE(), 112)) - CONVERT(int, CONVERT(char(8), u.DateOfBirth, 112))) / 10000) x
CROSS APPLY (SELECT AgeDecade = CASE
WHEN x.Age <= 29 THEN 20
WHEN x.Age BETWEEN 30 AND 39 THEN 30
WHEN x.Age BETWEEN 40 AND 49 THEN 40
WHEN x.Age >= 50 THEN 50
ELSE NULL
END
) y
GROUP BY y.AgeDecade
Placing the logic into CROSS APPLY makes it easier to reuse the logic within the same query, this way you can use it in SELECT, GROUP BY, ORDER BY, WHERE, etc, without having to duplicate it. This could also be done using a cte, but in this scenario, this is my preference.
Update:
You asked in the comments how it would be possible to show a count of 0 when no people exist for an age group. In most cases the answer is simple, LEFT JOIN. As with everything, there's always more than one way to bake a cake.
Here are a couple ways you can do it:
The simple left join, take the query from my original answer, and just do a left join to a table. You could do this in a couple ways, CTE, temp table, table variable, sub-query, etc. The takeaway is, you need to isolate your User table somehow.
Simple Sub-query method, nothing fancy. Just stuck the whole query into a sub-query, then left joined it to our new lookup table.
DECLARE #AgeGroup TABLE (AgeGroupID tinyint NOT NULL);
INSERT INTO #AgeGroup (AgeGroupID) VALUES (20),(30),(40),(50);
SELECT ag.AgeGroupID, TotalCount = COUNT(a.AgeDecade)
FROM #AgeGroup ag
LEFT JOIN (
SELECT y.AgeDecade
FROM #User u
CROSS APPLY (SELECT Age = (CONVERT(int, CONVERT(char(8), GETDATE(), 112)) - CONVERT(int, CONVERT(char(8), u.DateOfBirth, 112))) / 10000) x
CROSS APPLY (SELECT AgeDecade = CASE
WHEN x.Age <= 29 THEN 20
WHEN x.Age BETWEEN 30 AND 39 THEN 30
WHEN x.Age BETWEEN 40 AND 49 THEN 40
WHEN x.Age >= 50 THEN 50
ELSE NULL
END
) y
) a ON a.AgeDecade = ag.AgeGroupID
GROUP BY ag.AgeGroupID;
This would be the exact same thing as using a cte:
DECLARE #AgeGroup TABLE (AgeGroupID tinyint NOT NULL);
INSERT INTO #AgeGroup (AgeGroupID) VALUES (20),(30),(40),(50);
WITH cte_Users AS (
SELECT y.AgeDecade
FROM #User u
CROSS APPLY (SELECT Age = (CONVERT(int, CONVERT(char(8), GETDATE(), 112)) - CONVERT(int, CONVERT(char(8), u.DateOfBirth, 112))) / 10000) x
CROSS APPLY (SELECT AgeDecade = CASE
WHEN x.Age <= 29 THEN 20
WHEN x.Age BETWEEN 30 AND 39 THEN 30
WHEN x.Age BETWEEN 40 AND 49 THEN 40
WHEN x.Age >= 50 THEN 50
ELSE NULL
END
) y
)
SELECT ag.AgeGroupID, TotalCount = COUNT(a.AgeDecade)
FROM #AgeGroup ag
LEFT JOIN cte_Users a ON a.AgeDecade = ag.AgeGroupID
GROUP BY ag.AgeGroupID;
Choosing between the two is purely preference. There's no performance gain to using a CTE here.
Bonus:
If you wanted to table drive your groups and also have 0 counts, you could do something like this...though I will warn you to be careful using APPLY operators because they can be tricky with performance sometimes.
IF OBJECT_ID('tempdb..#User','U') IS NOT NULL DROP TABLE #User; --SELECT * FROM #User
WITH c1 AS (SELECT x.x FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) x(x)) -- 10
, c2(x) AS (SELECT 1 FROM c1 x CROSS JOIN c1 y) -- 10 * 10
SELECT UserID = IDENTITY(int,1,1)
, DateOfBirth = CONVERT(date, GETDATE()-(RAND(CHECKSUM(NEWID()))*18500))
INTO #User
FROM c2 u;
IF OBJECT_ID('tempdb..#AgeRange','U') IS NOT NULL DROP TABLE #AgeRange; --SELECT * FROM #AgeRange
CREATE TABLE #AgeRange (
AgeRangeID tinyint NOT NULL IDENTITY(1,1),
RangeStart tinyint NOT NULL,
RangeEnd tinyint NOT NULL,
RangeLabel varchar(100) NOT NULL,
);
INSERT INTO #AgeRange (RangeStart, RangeEnd, RangeLabel)
VALUES ( 0, 29, '< 29')
, (30, 39, '30 - 39')
, (40, 49, '40 - 49')
, (50, 255, '50+');
-- Using an OUTER APPLY
SELECT ar.RangeLabel, COUNT(y.UserID)
FROM #AgeRange ar
OUTER APPLY (
SELECT u.UserID
FROM #User u
CROSS APPLY (SELECT Age = (CONVERT(int, CONVERT(char(8), GETDATE(), 112)) - CONVERT(int, CONVERT(char(8), u.DateOfBirth, 112))) / 10000) x
WHERE x.Age BETWEEN ar.RangeStart AND ar.RangeEnd
) y
GROUP BY ar.RangeLabel, ar.RangeStart
ORDER BY ar.RangeStart;
-- Using a CTE
WITH cte_users AS (
SELECT u.UserID
, Age = (CONVERT(int, CONVERT(char(8), GETDATE(), 112)) - CONVERT(int, CONVERT(char(8), u.DateOfBirth, 112))) / 10000
FROM #User u
)
SELECT ar.RangeLabel, COUNT(u.UserID)
FROM #AgeRange ar
LEFT JOIN cte_users u ON u.Age BETWEEN ar.RangeStart AND ar.RangeEnd
GROUP BY ar.RangeStart, ar.RangeLabel
ORDER BY ar.RangeStart;

I would start by putting the age computation in a lateral join, so it can easily be referred to. Then, if you want the age groups as rows, you can join a derived table that describes the intervals:
select v.age_group, count(*) as cnt_users
from dbo.[User] u
cross apply (values
((convert(int, convert(char(8), getdate(),112)) - convert(char(8), u.[DateOfBirth], 112))/10000)
) a(age)
inner join (values
( 0, 30, '0-30'),
(30, 40, '30-40'),
(40, 50, '40-50'),
(50, null, '50+')
) v(min_age, max_age, age_group)
on a.age >= v.min_age
and (a.age < v.max_age or v.max_age is null)
group by v.age_group
On the other hands, if you want the counts in columns, use conditional aggregation:
select
sum(case when a.age < 30 then 1 else 0 end) as age_0_30,
sum(case when a.age >= 30 and a.age < 40 then 1 else 0 end) as age_30_40,
sum(case when a.age >= 40 and a.age < 50 then 1 else 0 end) as age_40_50,
sum(case when a.age >= 50 then 1 else 0 end) as age_50
from dbo.[User] u
cross apply (values
((convert(int, convert(char(8), getdate(),112)) - convert(char(8), [DateOfBirth], 112))/10000)
) a(age)

yes you can.
this query should work with you
SELECT STR(ROUND(DATEDIFF(year, DateOfBirth, GETDATE()), - 1) - 10) + 's' AS [Age Group], COUNT(UserId) AS Count
FROM dbo.User
GROUP BY STR(ROUND(DATEDIFF(year, DateOfBirth, GETDATE()), - 1) - 10) + 's'
for your updated question
SELECT CASE
WHEN (ROUND(DATEDIFF(year, DateOfBirth, GETDATE()), - 1) - 10) < 30 THEN '20s'
WHEN (ROUND(DATEDIFF(year, DateOfBirth, GETDATE()), - 1) - 10) >= 50 THEN '50s'
ELSE str(ROUND(DATEDIFF(year, DateOfBirth, GETDATE()), - 1) - 10) + 's'
END AS [Age Group], COUNT(UserId) AS Count
FROM dbo.User
GROUP BY CASE
WHEN (ROUND(DATEDIFF(year, DateOfBirth, GETDATE()), - 1) - 10) < 30 THEN '20s'
WHEN (ROUND(DATEDIFF(year, DateOfBirth, GETDATE()), - 1) - 10) >= 50 THEN '50s'
ELSE str(ROUND(DATEDIFF(year, DateOfBirth, GETDATE()), - 1) - 10) + 's'
END

You could use round with a length argument of -1 and a non-zero function argument to truncate the value to "tens", and group by it:
SELECT UserId,
Round((Convert(int,Convert(char(8),GETDATE(),112))-Convert(char(8),[DateOfBirth],112))/10000, -1, 1) AS [Rounded Age],
Count(*)
FROM dbo.[User]
GROUP BY Round((Convert(int,Convert(char(8),GETDATE(),112))-Convert(char(8),[DateOfBirth],112))/10000, -1, 1)

Showing list of all 24 hours in sql server if there is no data also

I have a query where I need to show 24 hour calls for each day.
But I am getting the hours which I have calls only.
My requirement is I need to get all the hours split and 0 if there are no calls.
Please suggest
Below is my code.
select #TrendStartDate
,isd.Name
,isd.Call_ID
,isd.callType
,DATEPART(HOUR,isd.ArrivalTime)
from [PHONE_CALLS] ISD WITH (NOLOCK)
WHERE CallType = 'Incoming'
and Name not in ('DefaultQueue')
and CAST(ArrivalTime as DATe) between #TrendStartDate and #TrendEndDate

The basic idea is that you use a table containing numbers from 0 to 23, and left join that to your data table:
WITH CTE AS
(
SELECT TOP 24 ROW_NUMBER() OVER(ORDER BY ##SPID) - 1 As TheHour
FROM sys.objects
)
SELECT #TrendStartDate
,isd.Name
,isd.Call_ID
,isd.callType
,TheHour
FROM CTE
LEFT JOIN [PHONE_CALLS] ISD WITH (NOLOCK)
ON DATEPART(HOUR,isd.ArrivalTime) = TheHour
AND CallType = 'Incoming'
AND Name NOT IN ('DefaultQueue')
AND CAST(ArrivalTime as DATe) BETWEEN #TrendStartDate AND #TrendEndDate
If you have a tally table, you should use that. If not, the cte will provide you with numbers from 0 to 23.

If you have a numbers table you can use a query like the following:
SELECT d.Date,
h.Hour,
Calls = COUNT(pc.Call_ID)
FROM ( SELECT [Hour] = Number
FROM dbo.Numbers
WHERE Number >= 0
AND Number < 24
) AS h
CROSS JOIN
( SELECT Date = DATEADD(DAY, Number, #TrendStartDate)
FROM dbo.Numbers
WHERE Number <= DATEDIFF(DAY, #TrendStartDate, #TrendEndDate)
) AS d
LEFT JOIN [PHONE_CALLS] AS pc
ON pc.CallType = 'Incoming'
AND pc.Name NOT IN ('DefaultQueue')
AND CAST(pc.ArrivalTime AS DATE) = d.Date
AND DATEPART(HOUR, pc.ArrivalTime) = h.Hour
GROUP BY d.Date, h.Hour
ORDER BY d.Date, h.Hour;
The key is to get all the hours you need:
SELECT [Hour] = Number
FROM dbo.Numbers
WHERE Number >= 0
AND Number < 24
And all the days that you need in your range:
SELECT Date = DATEADD(DAY, Number, #TrendStartDate)
FROM dbo.Numbers
WHERE Number < DATEDIFF(DAY, #TrendStartDate, #TrendEndDate)
Then cross join the two, so that you are guaranteed to have all 24 hours for each day you want. Finally, you can left join to your call table to get the count of calls.
Example on DB<>Fiddle

You can use SQL SERVER recursivity with CTE to generate the hours between 0 and 23 and then a left outer join with the call table
You also use any other Method mentioned in this link to generate numbers from 0 to 23
Link to SQLFiddle
set dateformat ymd
declare #calls as table(date date,hour int,calls int)
insert into #calls values('2020-01-02',0,66),('2020-01-02',1,888),
('2020-01-02',2,5),('2020-01-02',3,8),
('2020-01-02',4,9),('2020-01-02',5,55),('2020-01-02',6,44),('2020-01-02',7,87),('2020-01-02',8,90),
('2020-01-02',9,34),('2020-01-02',10,22),('2020-01-02',11,65),('2020-01-02',12,54),('2020-01-02',13,78),
('2020-01-02',23,99);
with cte as (select 0 n,date from #calls union all select 1+n,date from cte where 1+n <24)
select distinct(cte.date),cte.n [Hour],isnull(ca.calls,0) calls from cte left outer join #calls ca on cte.n=ca.hour and cte.date=ca.date

How to use Aggregate in Select Statement

I have problem with my code and it seems it didn't accept aggregate function inside my select query. What I am trying to achieve with my DATEDIFF(MONTH,MIN(LoanDateStart),MAX(LoanPaymentDue)) is that I want to get the total number of months and then use the number of months to calculate with the rest of the query.
I got this error:
Msg 130, Level 15, State 1, Line 11
Cannot perform an aggregate function on an expression containing an aggregate or a subquery.
Is there anyway that I could achieve the other way around?
Query
SELECT
ISNULL(SUM((CAST(((((lt.InterestRate/100) * lc.LoanAmount) +
lc.LoanAmount) / ((dbo.fnNumberOfYears(CONVERT(VARCHAR(15), LoanDateStart,
101), CONVERT(VARCHAR(15), LoanPaymentDue, 101)) * DATEDIFF(MONTH,
MIN(LoanDateStart), MAX(LoanPaymentDue))) * 2)) AS DECIMAL(18,2)))),0)
FROM LoanContract lc
INNER JOIN LoanType lt ON lt.LoanTypeID = lc.LoanTypeID
WHERE lc.LoanTypeID = 1 AND lc.EmployeeID = 5

Please try the following...
SELECT ISNULL( SUM( ( CAST( ( ( ( ( lt.InterestRate / 100 ) *
LoanContract.LoanAmount ) +
LoanContract.LoanAmount ) /
( ( dbo.fnNumberOfYears( CONVERT( VARCHAR( 15 ),
LoanDateStart,
101 ),
CONVERT( VARCHAR( 15 ),
LoanPaymentDue,
101 ) ) *
monthsDifference ) *
2 ) ) AS DECIMAL( 18,
2 ) ) ) ),
0 )
FROM ( SELECT LoanContractID AS LoanContractID,
DATEDIFF( MONTH,
MIN( LoanDateStart ),
MAX( LoanPaymentDue ) ) AS monthsDifference
FROM LoanContract
GROUP BY LoanContractID
) AS monthsDifferenceFinder
INNER JOIN LoanContract ON LoanContract.LoanContractID = monthsDifferenceFinder.LoanContractID
INNER JOIN LoanType lt ON lt.LoanTypeID = LoanContract.LoanTypeID
WHERE lc.LoanTypeID = 1
AND lc.EmployeeID = 5
Please note that I have used LoanContractID in the place of the primary key for LoanContract as you have not stated what that is.
The cause of your problem was that SUM() (an aggregate function) was operating on the results of MIN() and MAX(), themselves aggregate functions. This confused SQL-Server.
I have worked around this problem by writing a subquery that determines the difference between your two values for each LoanContract, based on the unique identifier for that LoanContract as opposed to the primary key for each record. (Is your data in 3NF (Third Normal Form)?)
The results of the subquery are then joined to LoanContract based on LoanContractID, effectively appending each LoanContract's monthsDifference value to each record for that LoanContract.
SUM() is then able to work on a retrieved value rather than having to try to aggregate correctly aggregated values.
If you have any questions or comments, then please feel free to post a Comment accordingly.

Anyway, I fixed the issue right now, I'm using LEFT OUTER JOIN.
Here is my code:
SELECT
ISNULL(SUM((CAST(((((lt.InterestRate/100) * lc.LoanAmount) + lc.LoanAmount) / ((dbo.fnNumberOfYears(CONVERT(VARCHAR(15), LoanDateStart, 101), CONVERT(VARCHAR(15), LoanPaymentDue, 101)) * x.NumberOfMonths) * 2)) AS DECIMAL(18,2)))),0)
FROM LoanContract lc
INNER JOIN LoanType lt ON lt.LoanTypeID = lc.LoanTypeID
LEFT OUTER JOIN(SELECT
lc.LoanTypeID AS 'LoanTypeID',
lc.EmployeeID AS 'EmployeeID',
(DATEDIFF(MONTH, MIN(LoanDateStart), MAX(LoanPaymentDue))) AS 'NumberOfMonths'
FROM LoanContract lc
INNER JOIN LoanType lt ON lt.LoanTypeID = lc.LoanTypeID
GROUP BY lc.EmployeeID, lc.LoanTypeID) x ON x.EmployeeID = lc.EmployeeID AND x.LoanTypeID = lc.LoanTypeID
WHERE lc.LoanTypeID = 1 AND lc.EmployeeID = 5

Join data from hourly and 30 minute tables

I have a table with data every hour:
and a table with data every 30 minutes:
I would like to join these two tables (by date) to get batVolt and TA in the same table and repeat the values for batVolt for the 30 minutes between the hour.

SELECT *
FROM HourTable t
INNER JOIN HalfHourTable ht
ON CAST(t.repDate AS Date) = CAST(ht.repDate AS Date)
AND DATEPART(HOUR, t.repDate) = DATEPART(HOUR, ht.repDate)
Edit
Your query should be
SELECT n.repDate
, n.TA
, a.batVolt
FROM [DAP].[dbo].[ARRMet] AS n
FULL JOIN [DAP].[dbo].[array3] AS a
ON DATEPART(HOUR, n.repDate) = DATEPART(HOUR, a.repDate)
AND CAST(n.repDate AS DATE) = CAST(a.repDate AS DATE)
WHERE CAST(n.repDate AS DATE) = '20150831'
ORDER BY n.repDate DESC

I would do this slightly differently than M.Ali. This uses fewer functions and seems a bit simpler to me.
SELECT *
FROM HourTable t
INNER JOIN HalfHourTable ht on
t.repDate = dateadd(hour, datediff(hour, 0, ht.repDate), 0)

You could use DATEPART()
select ta.repDate, ta.code, ta.TA, bat.batVolt
from table2 ta
join table 1 bat
on (DATEPART(yyyymmddhh,bat.repDate) = DATEPART(yyyymmddhh, ta.repDate)
I don't remember exact syntax for datepart but should be something similar to this

You can try following
SELECT x.*, y.*
FROM BatVoltDetails x
INNER JOIN TADetails y
ON LEFT(CONVERT(VARCHAR, x.repDate, 120), 13) = LEFT(CONVERT(VARCHAR, y.repDate, 120), 13)

Returning unique results in a joined select

I need help with a query that check the MSDB-database for SQL Server Agent Job results. My query is as follows:
SELECT CONVERT(VARCHAR(30), Serverproperty('ServerName')),
a.run_status,
b.run_requested_date,
c.name,
CASE c.enabled
WHEN 1 THEN 'Enabled'
ELSE 'Disabled'
END,
CONVERT(VARCHAR(10), CONVERT(DATETIME, Rtrim(19000101))+(a.run_duration *
9 +
a.run_duration % 10000 * 6 + a.run_duration % 100 * 10) / 216e4, 108),
b.next_scheduled_run_date
FROM (msdb.dbo.sysjobhistory a
LEFT JOIN msdb.dbo.sysjobactivity b
ON b.job_history_id = a.instance_id)
JOIN msdb.dbo.sysjobs c
ON b.job_id = c.job_id
ORDER BY c.name
So far so good, but running it returns several results for the same jobs depending on how many times they have ran up until the query. This is no good. I want only one result per job, and only the latest.
If I add the the string:
WHERE b.session_id=(SELECT MAX(session_id) from msdb.dbo.sysjobactivity)
It works better, but then it only lists the latest jobs depending on the session_id parameter. This will exclude jobs that haven't run for a while and is not good either.
Can someone help me with this?
I have tried with DISTINCT and/or GROUP BY but can't get it to work.

with cte
AS (SELECT
Convert(varchar(30), SERVERPROPERTY('ServerName')) AS ServerName,
a.run_status,
b.run_requested_date,
c.name,
CASE c.enabled
WHEN 1 THEN 'Enabled'
Else 'Disabled'
END
AS Enabled,
CONVERT(VARCHAR(10), CONVERT(DATETIME, RTRIM(19000101))+(a.run_duration
* 9 +
a.run_duration % 10000 * 6 + a.run_duration % 100 * 10) / 216e4, 108)
AS run_duration,
b.next_scheduled_run_date,
ROW_NUMBER() over (partition by b.job_id ORDER BY b.run_requested_date
DESC) AS RN
FROM (msdb.dbo.sysjobhistory a
LEFT JOIN msdb.dbo.sysjobactivity b
ON b.job_history_id = a.instance_id)
join msdb.dbo.sysjobs c
on b.job_id = c.job_id)
SELECT *
FROM cte
WHERE RN = 1
ORDER BY name

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Complex LEFT JOIN not working as expected - sql

Change the GROUP BY to: GROUP BY m.Name You are aggregating by the column in the second table, so you only get one row for all the NULL values. Most databases would reject this syntax, but apparently Intersystems Cache allows it.

Related

Grouping the result set based on conditions

Showing list of all 24 hours in sql server if there is no data also

How to use Aggregate in Select Statement

Join data from hourly and 30 minute tables

Returning unique results in a joined select

Categories

Resources