Historical sum for current group - sql

I have a SQL database with people, from which I want to see how much experience each person has in a specific department.
In the current query I have the following code:
SELECT [PERSON_ID]
,sum(case when [DEPARTMENT] = 'Marketing' then 1 else 0 end) as Exp_Marketing
,sum(case when [FUNCTION_DESC] = 'Finance' then 1 else 0 end) as Exp_Finance
FROM [xxxx].[xxxx].[xxxx]
GROUP BY [PERSON_ID]
Each person has one row for the months of service, so a person with 12 months of experience in Finance has a value of 12 in the Exp_Finance column.
The issue however is that the result now shows the outcome for all people. Also the one who already left the organization. How can I make sure the result only shows the historical information for the people currently part in the organization. In other words, the ones actually having a row with "2018M06" as value for the Date column.

You can use a having clause:
SELECT [PERSON_ID],
sum(case when [DEPARTMENT] = 'Marketing' then 1 else 0 end) as Exp_Marketing,
sum(case when [FUNCTION_DESC] = 'Finance' then 1 else 0 end) as Exp_Finance
FROM [xxxx].[xxxx].[xxxx]
GROUP BY [PERSON_ID]
HAVING MAX([DATE]) = '2018M06';
Your month format seems amenable to using MAX().

You should add an EXISTS within a WHERE clause so you only include people meeting your criteria.
SELECT [PERSON_ID]
,sum(case when [DEPARTMENT] = 'Marketing' then 1 else 0 end) as Exp_Marketing
,sum(case when [FUNCTION_DESC] = 'Finance' then 1 else 0 end) as Exp_Finance
FROM [xxxx].[xxxx].[xxxx] A
WHERE EXISTS (SELECT * FROM [xxxx].[xxxx].[xxxx] B
WHERE A.PERSON_ID = B.PERSON_ID AND B.[DATE] = '2018M06')
GROUP BY [PERSON_ID]

Related

HAVING gives me "column...does not exist" but I see the column

This is a practice question from stratascratch and I'm literally stuck at the final HAVING statement.
Problem statement:
Find the total number of downloads for paying and non-paying users by date. Include only records where non-paying customers have more downloads than paying customers. The output should be sorted by earliest date first and contain 3 columns date, non-paying downloads, paying downloads.
There are three tables:
ms_user_dimension (user_id, acc_id)
ms_acc_dimension (acc_id, paying_customer)
ms_download_facts (date, user_id, downloads)
This is my code so far
SELECT date,
SUM(CASE WHEN paying_customer = 'no' THEN cnt END) AS no,
SUM(CASE WHEN paying_customer = 'yes' THEN cnt END) AS yes
FROM (
SELECT date, paying_customer, SUM(downloads) AS cnt
FROM ms_download_facts d
LEFT JOIN ms_user_dimension u ON d.user_id = u.user_id
LEFT JOIN ms_acc_dimension a ON u.acc_id = a.acc_id
GROUP BY 1, 2
ORDER BY 1, 2
) prePivot
GROUP BY date
HAVING no > yes;
If I remove the HAVING no > yes at the end, the code will run and I can see I have three columns: date, yes, and no. However, if I add the HAVING statement, I get the error "column "no" does not exist...LINE 13: HAVING no > yes"
Can't figure out for the sake of my life what's going on here. Please let me know if anyone figures out something. TIA!
You don't need a subquery for this:
SELECT d.date,
SUM(CASE WHEN a.paying_customer = 'no' THEN d.downloads END) AS no,
SUM(CASE WHEN a.paying_customer = 'yes' THEN d.downloads END) AS yes
FROM ms_download_facts d LEFT JOIN
ms_user_dimension u
ON d.user_id = u.user_id LEFT JOIN
ms_acc_dimension a
ON u.acc_id = a.acc_id
GROUP BY d.date
HAVING SUM(CASE WHEN a.paying_customer = 'no' THEN d.downloads END) > SUM(CASE WHEN a.paying_customer = 'yes' THEN d.downloads END);
You can simplify the HAVING clause to:
HAVING SUM(CASE WHEN a.paying_customer = 'no' THEN 1 ELSE -1 END) > 0
This version assumes that paying_customer only takes on the values 'yes' and 'no'.
You may be able to simplify the query further, depending on the database you are using.
It doesn't like aliases in the having statement. Replace no with:
SUM(CASE WHEN paying_customer = 'no' THEN cnt END)
and do the similar thing for yes.
SELECT date,
SUM(CASE WHEN paying_customer = 'no' THEN cnt END) AS no,
SUM(CASE WHEN paying_customer = 'yes' THEN cnt END) AS yes
FROM (
SELECT date, paying_customer, SUM(downloads) AS cnt
FROM ms_download_facts d
LEFT JOIN ms_user_dimension u ON d.user_id = u.user_id
LEFT JOIN ms_acc_dimension a ON u.acc_id = a.acc_id
GROUP BY 1, 2
ORDER BY 1, 2
) prePivot
GROUP BY date
HAVING SUM(CASE WHEN paying_customer = 'no' THEN cnt END) > SUM(CASE WHEN paying_customer = 'yes' THEN cnt END);

SQL Year over Year Performance with Criteria

I am trying to return year over year results based on date criteria. There is additional information I would like to include in the query i.e. first date of activity and first date of activity with spot name like '%6%'. The current query I have is multiplying the correct amounts by 6 and I can't figure out how to solve. When I remove the first "where" clause I get the correct amounts. Any help would be appreciated.
Select
V.IGB_License,
DBA,
V.Sci_Games_Name,
convert(date,v2.Activity_date) as First6thMachineDate,
convert(date,V3.Activity_date) as GoLiveDate,
sum(case when (v.Activity_date between '1/23/2019' and DATEADD(YEAR,-2,getdate()-1)) then v.Funds_in else 0 end) as FundsIn2019,
sum(case when (v.Activity_date between '1/23/2020' and DATEADD(YEAR,-1,getdate()-1)) then v.Funds_in else 0 end) as FundsIn2020,
sum(case when (v.Activity_date between '1/23/2021' and getdate()) then v.Funds_in else 0 end) as FundsIn2021,
sum(case when (v.Activity_date between '1/23/2019' and DATEADD(YEAR,-2,getdate()-1)) then v.Net_funds else 0 end) as NetFunds2019,
sum(case when (v.Activity_date between '1/23/2020' and DATEADD(YEAR,-1,getdate()-1)) then v.Net_funds else 0 end) as NetFunds2020,
sum(case when (v.Activity_date between '1/23/2021' and getdate()) then v.Net_funds else 0 end) as NetFunds2021
From VGT_activity V
Left Join Locations on v.IGB_License = Locations.IGB_License
left join VGT_activity V2 on v.IGB_License = v2.IGB_License
Left join VGT_activity V3 on v.IGB_License = v3.IGB_License
Where v2.Activity_date = (
Select Min(V1.Activity_date)
From VGT_activity V1
Where v1.IGB_License = V2.IGB_License
and Spot_name like '%6%'
)
and v3.Activity_date = (
Select Min(V1.Activity_date)
From VGT_activity V1
Where v1.IGB_License = V3.IGB_License
)
group by V.IGB_License, dba, V.Sci_Games_Name, v2.Activity_date, v3.Activity_date
order by 4

Trying to group several rows based on donatedmoney status

I'm really struggling with this. I just can't seem to figure it out. I've got the concept in my head but don't exactly know how to put my plain language understanding of how to solve the problem into the correct Syntax.
Here is the question.
Give me a list of all donors and their addresses categorized by whether they donated art, money, or both.
Here is the set up for the tables.
CareTakers: CareTakerID, CareTakerName
Donations: DonationID, DonorID, DonatedMoney, ArtName, ArtType, ArtAppraisedPrice, ArtLocationBuilding, ArtLocationRoom, CareTakerID
Donors: DonorID, DonorName, DonorAddress
Here is what I have for my code so far.
SELECT
DISTINCT(DonorName), DonorAddress
FROM
Donors JOIN Donations ON Donors.DonorID = Donations.DonorID
GROUP BY
DonatedMoney
HAVING
DonatedMoney = 'Y' OR DonatedMoney = 'N' OR DonatedMoney = 'Y' AND ArtName IS NOT NULL
Any help would be highly appreciated!
Why would you use a having clause? The question specifies no filtering. The following summarizes the donations to get what you need and then joins the results back to the donors table:
select d.*, don.DonationType
from donors d join
(select don.donorid,
(case when sum(case when donatedmoney = 'Y' then 1 else 0 end) > 0 and
sum(case when artname is not null then 1 else 0 end) > 0
then 'Both'
when sum(case when donatedmoney = 'Y' then 1 else 0 end) > 0
then 'Money'
when sum(case when artname is not null then 1 else 0 end)
then 'Art'
else 'Neither'
end) as DonationType
from donations don
group by don.donorid
) don
on d.donorid = don.donorid

SQL - Dividing by Sum by Group

I am just learning SQL and have run into a problem creating a custom report. I am working with school attendance data. I want to create a report that gives membership days and number of days for each absence type.
I have successfully created a report for these separately.
Membership Days (calculated by counting days school was in session between the student's entry date and the current date. Membership days does not exist as a field on its own)
SELECT sum(case when cd.DATE_VALUE >= s.ENTRYDATE and cd.DATE_VALUE <= current_timestamp THEN cd.INSESSION ELSE 0 END), s.LASTFIRST
FROM CALENDAR_DAY cd,STUDENTS s
WHERE cd.SCHOOLID = 405
GROUP BY s.LASTFIRST
Count per absence type
SELECT s.STUDENT_NUMBER, s.LASTFIRST,SUM(CASE WHEN a.ATTENDANCE_CODEID = 2 THEN 1 ELSE 0 END),SUM(CASE WHEN a.ATTENDANCE_CODEID = 4 THEN 1 ELSE 0 END),SUM(CASE WHEN a.ATTENDANCE_CODEID = 3 THEN 1 ELSE 0 END),SUM(CASE WHEN a.ATTENDANCE_CODEID = 51 THEN 1 ELSE 0 END)
FROM ATTENDANCE a
INNER join STUDENTS s
ON a.STUDENTID = s.ID
WHERE a.att_date between '%param1%' and '%param2%'
GROUP BY s.STUDENT_NUMBER, s.LASTFIRST
The problem is that if I try to put these in the same report, the membership days are multiplied by the number of times the student appears in the attendance table due to joining student and attendance. My thought on a solution was to then divide this line
sum(case when cd.DATE_VALUE >= s.ENTRYDATE and cd.DATE_VALUE <= current_timestamp THEN cd.INSESSION ELSE 0 END)
by the number of times the student showed up in the attendance table to counteract the student information existing on every line. I can't figure out how to do that. I don't know much about these types of problems, so hopefully I've just gone off on the wrong track and there is an easy solution. Thanks.
Your problem is a common problem -- trying to summarize along two dimensions at the same time without using a subquery. You want to do this query with two aggregation subqueries. Something like this:
SELECT *
FROM (SELECT sum(case when cd.DATE_VALUE >= s.ENTRYDATE and cd.DATE_VALUE <= current_timestamp
THEN cd.INSESSION
ELSE 0
END), s. STUDENT_NUMBER
FROM CALENDAR_DAY cd CROSS JOIN
STUDENTS s
WHERE cd.SCHOOLID = 405
GROUP BY s.STUDENT_NUMBER
) sc JOIN
(SELECT s.STUDENT_NUMBER, s.LASTFIRST,
SUM(CASE WHEN a.ATTENDANCE_CODEID = 2 THEN 1 ELSE 0 END),
SUM(CASE WHEN a.ATTENDANCE_CODEID = 4 THEN 1 ELSE 0 END),
SUM(CASE WHEN a.ATTENDANCE_CODEID = 3 THEN 1 ELSE 0 END),
SUM(CASE WHEN a.ATTENDANCE_CODEID = 51 THEN 1 ELSE 0 END)
FROM ATTENDANCE a INNER join
STUDENTS s
ON a.STUDENTID = s.ID
WHERE a.att_date between '%param1%' and '%param2%'
GROUP BY s.STUDENT_NUMBER, s.LASTFIRST
) sa
on sc.STUDENT_NUMBER = sa.STUDENT_NUMBER;

SQL Server Join AND QUERY Issue

I have a table billing_history which consists of the following fields
CREATE TABLE [dbo].[BILLING_HISTORY2](
[ID] [int] IDENTITY(1,1) NOT NULL,
[READING_MONTH_YEAR] [date] NULL,
[READING] [int] NULL,
[CONSUMER_ID] [int] NULL,
[payment_status] [bit] NOT NULL
)
http://www.sqlfiddle.com/#!3/892e0/5
The following queries return the MAX Paid And Max Unpaid Values for a Consumer
SELECT MAX(READING) AS 'MAXIMUM_PAID_READING',consumer_id from billing_history2
where payment_status=0 GROUP BY consumer_id;
SELECT MAX(READING) AS 'MAXIMUM_UNPAID_READING',consumer_id from billing_history2
where payment_status=1 GROUP BY consumer_id;
However when i join them to subtract the MAXIMUM_PAID_READING from MAXIMUM_UNPAID_READING to get the current reading, by joining the above two queries, it results in returning all those records which have a matched consumer_id. So if a consumer hasn't paid any bill yet, the id would be omitted in the PAID_READING and hence an INNER JOIN doesn't return any result. IF i use FULL OUTER JOIN, it return all the records but sets the difference to NULL.
I need to find out the current USAGE of the customer by subtracting it from the previous unpaid USAGE.
How do i go about joining these two queries in such a way that the resulting difference is found irrespective whether the person has paid a bill in the past or not.
Don't join them.
SELECT
MAX(CASE WHEN payment_status=0 THEN READING ELSE NULL END) AS 'MAXIMUM_PAID_READING',
MAX(CASE WHEN payment_status=1 THEN READING ELSE NULL END) AS 'MAXIMUM_UNPAID_READING',
MAX(CASE WHEN payment_status=0 THEN READING ELSE NULL END)
- MAX(CASE WHEN payment_status=1 THEN READING ELSE NULL END)
AS 'DIFF_READING',
consumer_id from billing_history2
GROUP BY consumer_id;
If you want handle NULL values from MAX function, use ISNULL function:
SELECT
ISNULL(MAX(CASE WHEN payment_status=0 THEN READING ELSE NULL END),0) AS 'MAXIMUM_PAID_READING',
ISNULL(MAX(CASE WHEN payment_status=1 THEN READING ELSE NULL END),0) AS 'MAXIMUM_UNPAID_READING',
ISNULL(MAX(CASE WHEN payment_status=0 THEN READING ELSE NULL END),0)
- ISNULL(MAX(CASE WHEN payment_status=1 THEN READING ELSE NULL END),0)
AS 'DIFF_READING',
consumer_id from billing_history2
GROUP BY consumer_id;
It's ok to use left outer join, but use isnull on your max value.
ex:
max(...) - isnull(max(...),0) as ...
In your case is necessary to use CASE expression
SELECT CONSUMER_ID,
MAX(CASE WHEN payment_status = 0 THEN READING ELSE 0 END) AS 'MAXIMUM_PAID_READING',
MAX(CASE WHEN payment_status = 1 THEN READING ELSE 0 END) AS 'MAXIMUM_UNPAID_READING',
MAX(CASE WHEN payment_status = 0 THEN READING ELSE 0 END) -
MAX(CASE WHEN payment_status = 1 THEN READING ELSE 0 END) AS diff
FROM billing_history2
GROUP BY consumer_id
Demo on SQLFiddle