SQL group by count across two tables - sql

I have got two tables called baseline and revisits
baseline
formid-------NoOfIssues
1--------------3
2--------------4
3--------------5
revisits
id------formid-------NoOfIssues-----------date--------------fid
1---------2--------------4-------------5/06/2016------------1
2---------3--------------3-------------15/06/2016-----------1
3---------1--------------4-------------20/07/2016-----------1
4---------1--------------3-------------25/07/2016-----------1
5---------2--------------5-------------28/07/2016-----------1
6---------1--------------5-------------01/06/2016-----------1
7---------3--------------8-------------21/02/2016-----------1
8---------3--------------2-------------21/02/2016-----------2
These tables are joined by 'formid'.
I need to compare number of issues in baseline vs revisits(only first) and get a count as reduced, increased or equal
Based upon the above table i am expecting the following, for example across all three baseline entries no equals were found comparing NoOfissues in first revisit against same formid, but 1 equal and 2 increased were found
Addition: if same date and same formid is found than take the lower fid, so in the last two rows of revisits table both formid and date are equal but need to consider the lower formid which is 1
status----------Count
reduced----------0
equal------------1
increased--------2

I'm not familiar with intersystems-cache, but you can see if the following is valid SQL with that DB:
SELECT
CASE
WHEN BL.NoOfIssues = FR.NoOfIssues THEN 'equal'
WHEN BL.NoOfIssues > FR.NoOfIssues THEN 'reduced'
WHEN BL.NoOfIssues < FR.NoOfIssues THEN 'increased'
END AS status,
COUNT(*) AS Count
FROM
Baseline BL
INNER JOIN Revisits FR ON FR.formid = BL.formid
LEFT OUTER JOIN Revisits R ON
R.formid = BL.formid AND
(
R.date < FR.date OR
(R.date = FR.date AND R.fid > FR.fid)
)
WHERE
R.formid IS NULL
GROUP BY
CASE
WHEN BL.NoOfIssues = FR.NoOfIssues THEN 'equal'
WHEN BL.NoOfIssues > FR.NoOfIssues THEN 'reduced'
WHEN BL.NoOfIssues < FR.NoOfIssues THEN 'increased'
END
Some quick notes on your database though - You should probably decide on a standard of plural or singular table names and stick with it. Also, try to avoid common reserved words for object names, like date. Finally, if a revisit is basically the same as a visit, just on a later date then you should consider keeping them all in the same table.

I would do this in one row rather than three:
select sum(case when numissues = rcnt then 1 else 0 end) as equal,
sum(case when numissues > rcnt then 1 else 0 end) as reduced,
sum(case when numissues < rcnt then 1 else 0 end) as incrased
from (select b.form_id, b.numissues, count(r.form_id) as rcnt
from baseline b left join
revisits r
on b.form_id = r.form_id
group by b.form_id, b.numissues
) br;

I would use a window function to get the number of ussues at the minimum date, then compare that to the baseline number of issues
select
case when baseline.NoOfIssues = rev.NoOfIssues then 'equal'
when baseline.NoOfIssues > rev.NoOfIssues then 'reduced'
when baseline.NoOfIssues < rev.NoOfIssues then 'increased'
end as status,
count(*) as count
from baseline
inner join(
select
formid,
case when date = min(date) over(partition by formid) then NoOfIssues else null end as first_rev_issues
from revisits
) rev
on baseline.formid = rev.formid
group by
case when baseline.NoOfIssues = rev.NoOfIssues then 'equal'
when baseline.NoOfIssues > rev.NoOfIssues then 'reduced'
when baseline.NoOfIssues < rev.NoOfIssues then 'increased'
end

Or like this:
WITH CTE AS
(
SELECT
BL.FORMID,
BL.NOOFISSUES AS BLI,
RV.NOOFISSUES AS RVI,
RV.DATE,
ROW_NUMBER() OVER(PARTITION BY BL.FORMID ORDER BY RV.DATE) AS RN
FROM Baseline BL
INNER JOIN Revisits RV ON RV.FORMID = BL.FORMID
)
SELECT COALESCE(SUM(CASE WHEN RVI > BLI THEN 1 END), 0) AS INCREASED,
COALESCE(SUM(CASE WHEN RVI < BLI THEN 1 END), 0) AS DECREASED,
COALESCE(SUM(CASE WHEN RVI = BLI THEN 1 END), 0) AS EQUAL
FROM CTE
WHERE RN=1;

Related

How to nest multiple case when expressions and add a condition

I am trying to divide customers (contact_key) column who shopped in 2021 (A.TXN_MTH) into new and 'returning' with returning meaning that they had not shopped in the last 12 months (YYYYMM in X.Fiscal_mth_idnt column).
I am using CASE WHEN A.TXN_MTH = MIN(X.FISCAL_MTH_IDNT) THEN 'NEW' which is correct. The next case when should be when the max month before X.TXN_MTH is 12 or more months previous. I have added the 12 months part in the Where statement. Should I be nesting 3 CASE WHEN'S instead of WHERE?
SELECT
T.CONTACT_KEY
, A.TXN_MTH
, CASE WHEN A.TXN_MTH = MIN(X.FISCAL_MTH_IDNT) THEN 'NEW'
WHEN (MAX(CASE WHEN X.FISCAL_MTH_IDNT < A.TXN_MTH THEN X.FISCAL_MTH_IDNT ELSE NULL END)) THEN 'RETURNING'
END AS CUST_TYPE
FROM B_TRANSACTION T
INNER JOIN B_TIME X
ON T.TRANSACTION_DT_KEY = X.DATE_KEY
INNER JOIN A
ON A.CONTACT_KEY = T.CONTACT_KEY AND A.BU_KEY = T.BU_KEY
WHERE (MAX(CASE WHEN X.FISCAL_MTH_IDNT < A.TXN_MTH THEN X.FISCAL_MTH_IDNT ELSE NULL END)) < A.TXN_MTH - (date_format(add_months(concat_ws('-',substr(yearmonth,1,4),substr(yearmonth,5,2),'01'),-12),'yyyyMM')
GROUP BY
T.CONTACT_KEY
, TXN_MTH;
You have not described your tables, so assuming fiscal_mth_idnt is a DATE column then you can use the LAG analytic function to find the previous row's value:
SELECT contact_key,
txn_mth,
CASE
WHEN prev_fiscal_mth_idnt IS NULL
THEN 'NEW'
WHEN ADD_MONTHS(prev_fiscal_mth_idnt, 12) < fiscal_mth_idnt
THEN 'RETURNING'
ELSE 'CURRENT'
END AS cust_type
FROM (
SELECT T.CONTACT_KEY,
A.TXN_MTH,
yearmonth,
X.FISCAL_MTH_IDNT,
LAG(X.FISCAL_MTH_IDNT) OVER (
PARTITION BY T.CONTACT_KEY
ORDER BY X.FISCAL_MTH_IDNT
) AS prev_fiscal_mth_idnt
FROM B_TRANSACTION T
INNER JOIN B_TIME X
ON T.TRANSACTION_DT_KEY = X.DATE_KEY
INNER JOIN A
ON A.CONTACT_KEY = T.CONTACT_KEY AND A.BU_KEY = T.BU_KEY
)
WHERE yearmonth LIKE '2021%';

How to improve slow sql query with aggregate functions

I want to show top ten customers,sales,margin where customers is registred during this accounting year. The query takes about 65seconds to run and it is not accepted :-(
As you may see i am not good at sql and will be very happy for help to improve the query.
SELECT Top 10
AcTr.R3, Actor.Nm,
SUM(CASE WHEN AcTr.AcNo<='3999' THEN AcAm*-1 ELSE 0 END) AS Sales ,
SUM(AcAm*-1) AS TB
FROM AcTr, Actor
WHERE (Actor.CustNo = AcTr.R3) AND
(Actor.CustNo <> '0') AND
(Actor.CreDt >= '20180901') AND
(Actor.CreDt <= '20190430') AND
AcTr.AcYr = '2018' AND
AcTr.AcPr <= '8' AND
AcTr.AcNo>='3000' AND
AcTr.AcNo <= '4999'
GROUP BY AcTr.R3, Actor.Nm
ORDER BY Sales DESC
Welcome to the community. You have a good start, but future, it is more helpful if you can provide (as commented), the CREATE table declarations so users know the actual data types. Not always required, but helps.
As for your query layout, it is more common to show the JOIN syntax instead of WHERE showing relations between tables, but that comes in time and practice.
Indexes help and should be based on a combination of both WHERE/JOIN criteria AND Grouping fields. Also, if fields are numeric, then do not 'quote' them, just leave as numbers. For example, your AcYr, AcPr, AcNo. I would think that an account number really would be a string value vs number for accounting purposes.
I would suggest the following indexes on your tables
Table Index
Actr ( AcYr, AcPr, AcNo, R3 )
Actor ( CustNo, CreDt )
The Actr table I have the filtering criteria first and the R3 last to help optimize the GROUP BY. The Actor table by the customer number, then the CreDt (Create date??), and is it really a string, or is it a date field? If so, the date criteria would be something like '2018-09-01' and '2019-04-30'
select TOP 10
Actor.Nm,
PreSum.Sales,
PreSm.TB
from
( select
R3,
SUM(CASE WHEN AcTr.AcNo <= '3999'
THEN AcAm * -1 ELSE 0 END) AS Sales,
SUM( AcAm * -1) AS TB
from
Actr
where
AcTr.AcYr = 2018
AND AcTr.AcPr <= 8
AND AcTr.AcNo >= '3000'
AND AcTr.AcNo <= '4999'
GROUP BY
AcTr.R3 ) PreSum
JOIN Actor
on PreSum.R3 = Actor.CustNo
AND Actor.CustNo <> 0
AND Actor.CreDt >= '20180901'
AND Actor.CreDt <= '20190430'
order by
Sales DESC
Per latest inquiry / comment, wanting by year comparison and getting rid of the top 10 performers per a given time period.
select
Actor.Nm,
PreSum.Sales2018,
PreSum.Sales2019,
PreSum.TB2018,
PreSum.TB2019
from
( select
AcTr.R3,
SUM(CASE WHEN AcTr.AcYr = 2018
AND AcTr.AcNo <= '3999'
THEN AcAm * -1 ELSE 0 END) AS Sales2018,
SUM(CASE WHEN AcTr.AcYr = 2019 AND AcTr.AcNo <= '3999'
THEN AcAm * -1 ELSE 0 END) AS Sales2019,
SUM( CASE WHEN AcTr.AcYr = 2018
THEN AcAm * -1 else 0 end ) AS TB2018
SUM( CASE WHEN AcTr.AcYr = 2019
THEN AcAm * -1 else 0 end ) AS TB2019
from
Actr
where
AcTr.AcYr IN ( 2018, 2019 )
AND AcTr.AcPr <= 8
AND AcTr.AcNo >= '3000'
AND AcTr.AcNo <= '4999'
GROUP BY
AcTr.R3 ) PreSum
JOIN Actor
on PreSum.R3 = Actor.CustNo
AND Actor.CustNo <> 0
AND Actor.CreDt >= '20180901'
AND Actor.CreDt <= '20190430'
order by
Sales DESC

How to return count in 2 columns?

I have this query. It should return Count for both AWARDED (1) and NOT AWARDED(0) works from works table.
Select Count(w.WorkID)as Total, w.IsAwarded, org.OrganizationName
From Works w
Inner Join MC_MemberShip.Membership.Organization org
ON org.OrganizationID= w.Organization_ID
where Convert(varchar(11), w.OpeningDate) >= Convert(varchar(11), #FromDate)
and Convert(varchar(11), w.OpeningDate) < DATEADD(day, 1, Convert(varchar(11), #ToDate))
and w.IsActive=1 and
ISNULL(w.IsAwarded,0)= 0 and w.Organization_ID= case when #OrgID= -1 then w.Organization_ID else #OrgID end
group by org.OrganizationName, w.IsAwarded
Now this query returns Total count for NOT AWARDED i.e. 0 only but i want to return count for AWARDED too in same query.
Organization TotalAwardedWorks TotalNotAwardedWorks
Town 1 1 2
Town 2 44 33
Your query should look something like this:
select org.OrganizationName,
Count(*) as Total,
sum(case when w.IsAwarded = 0 or w.IsAwarded is null then 1 else 0 end) as TotalNotAward,
sum(case when w.IsAwarded = 1 then 0 else 1 end) as TotalAward
from Works w Inner Join
MC_MemberShip.Membership.Organization org
on org.OrganizationID = w.Organization_ID
where w.OpeningDate >= #FromDate and
w.OpeningDate < dateadd(day, 1, #ToDate) and
w.IsActive = 1 and
(w.Organization_ID = #OrgId or #OrgID= -1)
group by org.OrganizationName;
Notes:
Do not convert dates to strings to perform comparisons. That is just perverse.
Generally, the use of case in the where clause is discouraged. The logic is more clearly represented using or.
You can get what you want by using case to put conditions in the aggregation functions.

Conditional statements within select for strings

at the moment I query my database to find all interactions within a certain date parameter. I also join that table with my InteractionAttendees then join it by TargetGroups table. InteractionAttendees tells me who attended the interaction, TargetGroup tells me about the person, if they are part of a target group such as aboriginal or francophone.
I'm having trouble setting my Aboriginal and Francophone columns. I want to set it to be true if there was any attendees that are in the TargetGroup Table.
select CONVERT(date, getdate()) as ActivityDate,
CASE when Type = 0 Then 'General'
when Type = 1 Then 'Activity'
else 'Task' End as Type,
CASE when Indepth = 0 Then 'False' else 'True' End as Indepth,
Subject,
Comments,
'false' as Aboriginal,
'false' as FrancoPhone,
'false' as Female,
'false' as Youth,
'false' as Other
from Sarnia.dbo.Interactions as X
full outer join sarnia.dbo.InteractionAttendees as Y on X.Id = Y.Interaction_Id
full outer join Sarnia.dbo.TargetGroups as Z on Y.Person_Id = Z.PersonId
where ActivityDate >= '2015-07-01' and ActivityDate <= '2015-09-30'
group by ActivityDate, Type, Indepth, Created, subject, comments
for example
Interaction Table
Id
1
Interaction Attendee Table
Id InteractionId PersonId
1 1 5
2 1 10
TargetGroups Table
Id PersonId TargetValue
1 5 Aboriginal
2 10 Francophone
So my resulted table would be
Activity Date Aboriginal Francophone
---- True True
May I ask how do I properly populate my target group columns.
You can update your query something like this: Pivot in the old fashioned way. Based on your previous questions I am assuming it is a sql server DB.
MAX(IIF(TargetValue = 'Aboriginal', 'True', 'False')) as Aboriginal,
MAX(IIF(TargetValue = 'FrancoPhone', 'True', 'False')) as FrancoPhone
If you have SQL Server 2008 or older you can use something like this
MAX(CASE WHEN TargetValue = 'Aboriginal' THEN 'True' ELSE 'False' END) as Aboriginal,
MAX(CASE WHEN TargetValue = 'FrancoPhone' THEN 'True' ELSE 'False' END) as FrancoPhone
Here is an example of how to do this with a CTE. Note, I added this to the example sql you gave, but that sql has many errors.
with igroup as
(
SELECT IID,
CASE WHEN ACOUNT > 0 THEN true ELSE false END AS Aboriginal,
CASE WHEN FCOUNT > 0 THEN true ELSE false END AS Francophone
FROM (
SELECT InteractionId as IID,
SUM(CASE WHEN TargetValue = 'Aboriginal' THEN 1 ELSE 0 END) as ACOUNT,
SUM(CASE WHEN TargetValue = 'FrancoPhone' THEN 1 ELSE 0 END) as FCOUNT
FROM sarnia.dbo.InteractionAttendees A
LEFT JOIN sarnia.dbo.TargetGroups as G on A.Person_Id = G.PersonId
GROUP BY InteractionId
) X
)
select CONVERT(date, getdate()) as ActivityDate,
CASE when Type = 0 Then 'General'
when Type = 1 Then 'Activity'
else 'Task' End as Type,
CASE when Indepth = 0 Then 'False' else 'True' End as Indepth,
Subject,
Comments,
igroup.Aboriginal,
igroup.FrancoPhone,
from Sarnia.dbo.Interactions as X
full outer join sarnia.dbo.InteractionAttendees as Y on X.Id = Y.Interaction_Id
join igroup on X.Id = igroup.IID
where ActivityDate >= '2015-07-01' and ActivityDate <= '2015-09-30'
group by ActivityDate, Type, Indepth, Created, subject, comments

SQL - Dividing by Sum by Group

I am just learning SQL and have run into a problem creating a custom report. I am working with school attendance data. I want to create a report that gives membership days and number of days for each absence type.
I have successfully created a report for these separately.
Membership Days (calculated by counting days school was in session between the student's entry date and the current date. Membership days does not exist as a field on its own)
SELECT sum(case when cd.DATE_VALUE >= s.ENTRYDATE and cd.DATE_VALUE <= current_timestamp THEN cd.INSESSION ELSE 0 END), s.LASTFIRST
FROM CALENDAR_DAY cd,STUDENTS s
WHERE cd.SCHOOLID = 405
GROUP BY s.LASTFIRST
Count per absence type
SELECT s.STUDENT_NUMBER, s.LASTFIRST,SUM(CASE WHEN a.ATTENDANCE_CODEID = 2 THEN 1 ELSE 0 END),SUM(CASE WHEN a.ATTENDANCE_CODEID = 4 THEN 1 ELSE 0 END),SUM(CASE WHEN a.ATTENDANCE_CODEID = 3 THEN 1 ELSE 0 END),SUM(CASE WHEN a.ATTENDANCE_CODEID = 51 THEN 1 ELSE 0 END)
FROM ATTENDANCE a
INNER join STUDENTS s
ON a.STUDENTID = s.ID
WHERE a.att_date between '%param1%' and '%param2%'
GROUP BY s.STUDENT_NUMBER, s.LASTFIRST
The problem is that if I try to put these in the same report, the membership days are multiplied by the number of times the student appears in the attendance table due to joining student and attendance. My thought on a solution was to then divide this line
sum(case when cd.DATE_VALUE >= s.ENTRYDATE and cd.DATE_VALUE <= current_timestamp THEN cd.INSESSION ELSE 0 END)
by the number of times the student showed up in the attendance table to counteract the student information existing on every line. I can't figure out how to do that. I don't know much about these types of problems, so hopefully I've just gone off on the wrong track and there is an easy solution. Thanks.
Your problem is a common problem -- trying to summarize along two dimensions at the same time without using a subquery. You want to do this query with two aggregation subqueries. Something like this:
SELECT *
FROM (SELECT sum(case when cd.DATE_VALUE >= s.ENTRYDATE and cd.DATE_VALUE <= current_timestamp
THEN cd.INSESSION
ELSE 0
END), s. STUDENT_NUMBER
FROM CALENDAR_DAY cd CROSS JOIN
STUDENTS s
WHERE cd.SCHOOLID = 405
GROUP BY s.STUDENT_NUMBER
) sc JOIN
(SELECT s.STUDENT_NUMBER, s.LASTFIRST,
SUM(CASE WHEN a.ATTENDANCE_CODEID = 2 THEN 1 ELSE 0 END),
SUM(CASE WHEN a.ATTENDANCE_CODEID = 4 THEN 1 ELSE 0 END),
SUM(CASE WHEN a.ATTENDANCE_CODEID = 3 THEN 1 ELSE 0 END),
SUM(CASE WHEN a.ATTENDANCE_CODEID = 51 THEN 1 ELSE 0 END)
FROM ATTENDANCE a INNER join
STUDENTS s
ON a.STUDENTID = s.ID
WHERE a.att_date between '%param1%' and '%param2%'
GROUP BY s.STUDENT_NUMBER, s.LASTFIRST
) sa
on sc.STUDENT_NUMBER = sa.STUDENT_NUMBER;