SQL - Inserting a condition in a GROUP BY - sql

My issue is that some of the records in the result set are excluded because they are missing a Min_Date or Max_Date or Both. I need these records to be included so that I can show the runner ran in the race even if he did not reach a First, Last or any checkpoint. Any direction is appreciated.
SELECT A.Date, A.RunnerName, A.Duplicates, A.TotalWaypointsReached,
B.FirstWaypoint, C.LastWaypoint, C.rDateTime as MostRecent
FROM (
SELECT RunnerName,
CONVERT(NVARCHAR(25), rDatetime, 101) AS Date,
Min(case when FirstWaypoint is null OR FirstWaypoint = '' then null else rDateTime end) MIN_DATE,
Max(case when LastWaypoint is null OR LastWaypoint = '' then null else rDateTime end) MAX_DATE,
--IF(I'm missing a Max_Date, Min_Date, or both after all records in a group. Add Max(rDateTime) and Min(rDateTime))
Count(*) AS Duplicates,
SUM(TotalWaypoints) as TotalWaypointsReached
FROM Race A
GROUP BY RunnerName, CONVERT(NVARCHAR(25), rDateTime, 101)
HAVING Count(*) > 1 ) A
LEFT JOIN Race B
on A.RunnerName = B.RunnerName
and A.MIN_DATE = B.rDateTime
LEFT JOIN Race C
on A.RunnerName = C.RunnerName
and A.MAX_DATE = C.rDatetime
I'm using the select statement via SQL command in Visual Studio 2008.

Related

HAVING gives me "column...does not exist" but I see the column

This is a practice question from stratascratch and I'm literally stuck at the final HAVING statement.
Problem statement:
Find the total number of downloads for paying and non-paying users by date. Include only records where non-paying customers have more downloads than paying customers. The output should be sorted by earliest date first and contain 3 columns date, non-paying downloads, paying downloads.
There are three tables:
ms_user_dimension (user_id, acc_id)
ms_acc_dimension (acc_id, paying_customer)
ms_download_facts (date, user_id, downloads)
This is my code so far
SELECT date,
SUM(CASE WHEN paying_customer = 'no' THEN cnt END) AS no,
SUM(CASE WHEN paying_customer = 'yes' THEN cnt END) AS yes
FROM (
SELECT date, paying_customer, SUM(downloads) AS cnt
FROM ms_download_facts d
LEFT JOIN ms_user_dimension u ON d.user_id = u.user_id
LEFT JOIN ms_acc_dimension a ON u.acc_id = a.acc_id
GROUP BY 1, 2
ORDER BY 1, 2
) prePivot
GROUP BY date
HAVING no > yes;
If I remove the HAVING no > yes at the end, the code will run and I can see I have three columns: date, yes, and no. However, if I add the HAVING statement, I get the error "column "no" does not exist...LINE 13: HAVING no > yes"
Can't figure out for the sake of my life what's going on here. Please let me know if anyone figures out something. TIA!
You don't need a subquery for this:
SELECT d.date,
SUM(CASE WHEN a.paying_customer = 'no' THEN d.downloads END) AS no,
SUM(CASE WHEN a.paying_customer = 'yes' THEN d.downloads END) AS yes
FROM ms_download_facts d LEFT JOIN
ms_user_dimension u
ON d.user_id = u.user_id LEFT JOIN
ms_acc_dimension a
ON u.acc_id = a.acc_id
GROUP BY d.date
HAVING SUM(CASE WHEN a.paying_customer = 'no' THEN d.downloads END) > SUM(CASE WHEN a.paying_customer = 'yes' THEN d.downloads END);
You can simplify the HAVING clause to:
HAVING SUM(CASE WHEN a.paying_customer = 'no' THEN 1 ELSE -1 END) > 0
This version assumes that paying_customer only takes on the values 'yes' and 'no'.
You may be able to simplify the query further, depending on the database you are using.
It doesn't like aliases in the having statement. Replace no with:
SUM(CASE WHEN paying_customer = 'no' THEN cnt END)
and do the similar thing for yes.
SELECT date,
SUM(CASE WHEN paying_customer = 'no' THEN cnt END) AS no,
SUM(CASE WHEN paying_customer = 'yes' THEN cnt END) AS yes
FROM (
SELECT date, paying_customer, SUM(downloads) AS cnt
FROM ms_download_facts d
LEFT JOIN ms_user_dimension u ON d.user_id = u.user_id
LEFT JOIN ms_acc_dimension a ON u.acc_id = a.acc_id
GROUP BY 1, 2
ORDER BY 1, 2
) prePivot
GROUP BY date
HAVING SUM(CASE WHEN paying_customer = 'no' THEN cnt END) > SUM(CASE WHEN paying_customer = 'yes' THEN cnt END);

MSSQL Group by and Select rows from grouping

I'm trying to figure out if what I'm trying to do is possible. Instead of resorting to multiple queries on a table, I wanted to group the records by business date and id then group by the id and select one date for a field and another date for the other field.
SELECT
*
{AMOUNT FROM DATE}
{AMOUNT FROM OTHER DATE}
FROM (
SELECT
date,
id,
SUM(amount) AS amount
FROM
table
GROUP BY id, date
AS subquery
GROUP BY id
It seems that you're looking to do a pivot query. I usually use cross tabs for this. Based on the query you posted, it could look like:
SELECT
id,
SUM(CASE WHEN date = '20190901' THEN amount ELSE 0 END) AmountFromSept01,
SUM(CASE WHEN date = '20191001' THEN amount ELSE 0 END) AmountFromOct01
FROM (
SELECT
date,
id,
SUM(amount) AS amount
FROM
table
GROUP BY id, date
)AS subquery
GROUP BY id;
You could also use a CTE.
WITH CTE AS(
SELECT
date,
id,
SUM(amount) AS amount
FROM
table
GROUP BY id, date
)
SELECT
id,
SUM(CASE WHEN date = '20190901' THEN amount ELSE 0 END) AmountFromSept01,
SUM(CASE WHEN date = '20191001' THEN amount ELSE 0 END) AmountFromOct01
FROM CTE
GROUP BY id;
Or even be a rebel and do the operation directly.
SELECT
id,
SUM(CASE WHEN date = '20190901' THEN amount ELSE 0 END) AmountFromSept01,
SUM(CASE WHEN date = '20191001' THEN amount ELSE 0 END) AmountFromOct01
FROM CTE
GROUP BY id;
However, some people have tested for performance and found that pre-aggregating can improve performance.
If I understand you correctly, then you're just trying to pivot, but only with two particular dates:
select id,
date1 = sum(iif(date = '2000-01-01', amount, null)),
date2 = sum(iif(date = '2000-01-02', amount, null))
from [table]
group by id

Hive rolling sum of data over date

I am working on Hive and am facing an issue with rolling counts. The sample data I am working on is as shown below:
and the output I am expecting is as shown below:
I tried using the following query but it is not returning the rolling count:
select event_dt,status, count(distinct account) from
(select *, row_number() over (partition by account order by event_dt
desc)
as rnum from table.A
where event_dt between '2018-05-02' and '2018-05-04') x where rnum =1
group by event_dt, status;
Please help me with this if some one has solved a similar issue.
You seem to just want conditional aggregation:
select event_dt,
sum(case when status = 'Registered' then 1 else 0 end) as registered,
sum(case when status = 'active_acct' then 1 else 0 end) as active_acct,
sum(case when status = 'suspended' then 1 else 0 end) as suspended,
sum(case when status = 'reactive' then 1 else 0 end) as reactive
from table.A
group by event_dt
order by event_dt;
EDIT:
This is a tricky problem. The solution I've come up with does a cross-product of dates and users and then calculates the most recent status as of each date.
So:
select a.event_dt,
sum(case when aa.status = 'Registered' then 1 else 0 end) as registered,
sum(case when aa.status = 'active_acct' then 1 else 0 end) as active_acct,
sum(case when aa.status = 'suspended' then 1 else 0 end) as suspended,
sum(case when aa.status = 'reactive' then 1 else 0 end) as reactive
from (select d.event_dt, ac.account, a.status,
max(case when a.status is not null then a.timestamp end) over (partition by ac.account order by d.event_dt) as last_status_timestamp
from (select distinct event_dt from table.A) d cross join
(select distinct account from table.A) ac left join
(select a.*,
row_number() over (partition by account, event_dt order by timestamp desc) as seqnum
from table.A a
) a
on a.event_dt = d.event_dt and
a.account = ac.account and
a.seqnum = 1 -- get the last one on the date
) a left join
table.A aa
on aa.timestamp = a.last_status_timestamp and
aa.account = a.account
group by d.event_dt
order by d.event_dt;
What this is doing is creating a derived table with rows for all accounts and dates. This has the status on certain days, but not all days.
The cumulative max for last_status_timestamp calculates the most recent timestamp that has a valid status. This is then joined back to the table to get the status on that date. Voila! This is the status used for the conditional aggregation.
The cumulative max and join is a work-around because Hive does not (yet?) support the ignore nulls option in lag().

How to show 0 value using COUNT and SELECTon a SQL query

I have ONLY 1 table called Meeting that stores all meeting requests.
This table can be EMPTY.
It has several columns including requestType (which can only be "MT") meetingStatus (can only be either pending, approved, denied or canceled) and meetingCreatedTime
I want to count how many requests of each status's type (in other words how many requests are pending, how many are approved, denied and canceled) for the last 30 days
Problem is that if there is no request then nothing display but I want to display 0, how do I do it? Here is my query now:
SELECT [requestType],
( SELECT COUNT ([requestType]) FROM [Meeting] WHERE CAST([meetingCreatedTime] AS DATE) >= CAST(DateAdd(DAY,-30,Getdate()) AS DATE) AND [meetingStatus] = 'Approved') As 'Approved',
( SELECT COUNT ([requestType]) FROM [Meeting] WHERE CAST([meetingCreatedTime] AS DATE) >= CAST(DateAdd(DAY,-30,Getdate()) AS DATE) AND [meetingStatus] = 'Pending') As 'Pending',
( SELECT COUNT ([requestType]) FROM [Meeting] WHERE CAST([meetingCreatedTime] AS DATE) >= CAST(DateAdd(DAY,-30,Getdate()) AS DATE) AND [meetingStatus] = 'Canceled') As 'Canceled',
( SELECT COUNT ([requestType]) FROM [Meeting] WHERE CAST([meetingCreatedTime] AS DATE) >= CAST(DateAdd(DAY,-30,Getdate()) AS DATE) AND [meetingStatus] = 'Denied') As 'Denied'
FROM [Meeting]
WHERE CAST([meetingCreatedTime] AS DATE) >= CAST(DateAdd(DAY,-30,Getdate()) AS DATE) GROUP BY [requestType]
Result:
What I want is:
SELECT
RT.requestType,
SUM(CASE WHEN M.meetingStatus = 'Approved' THEN 1 ELSE 0 END) AS Approved,
SUM(CASE WHEN M.meetingStatus = 'Pending' THEN 1 ELSE 0 END) AS Pending,
SUM(CASE WHEN M.meetingStatus = 'Canceled' THEN 1 ELSE 0 END) AS Canceled,
SUM(CASE WHEN M.meetingStatus = 'Denied' THEN 1 ELSE 0 END) AS Denied,
FROM
(SELECT DISTINCT requestType FROM Meeting) RT
LEFT OUTER JOIN Meeting M ON
M.requestType = RT.requestType AND
M.meetingCreatedTime >= DATEADD(DAY, -30, GETDATE())
GROUP BY
RT.requestType
The SUMs are a much clearer (IMO) and much more efficient way of getting the counts that you need. Using the requestType table (assuming that you have one) lets you get results for every request type even if there are no meetings of that type in the date range. The LEFT OUTER JOIN to the meeting table allows the request type to still show up even if there are no meetings for that time period.
All of your CASTs between date values seem unnecessary.
Move those subqueries into simple sum/case statements:
select rt.request_type,
sum(case when [meetingStatus] = 'Approved' then 1 else 0 end),
sum(case when [meetingStatus] = 'Pending' then 1 else 0 end),
sum(case when [meetingStatus] = 'Canceled' then 1 else 0 end),
sum(case when [meetingStatus] = 'Denied' then 1 else 0 end)
from ( select 'MT' ) rt (request_type) --hopefully you have lookup table for this
left
join [Meeting] m on
rt.request_type = m.request_type and
CAST([meetingCreatedTime] AS DATE) >= CAST(DateAdd(DAY,-30,Getdate()) AS DATE)
group
by rt.request_type;
This is one possible approach to force one line to be visible in any case. Adapt this to your needs...
Copy it into an empty query window and execute... play around with the WHERE part...
DECLARE #Test TABLE (ID INT IDENTITY, GroupingKey VARCHAR(100));
INSERT INTO #Test VALUES ('a'),('a'),('b');
SELECT TOP 1 tbl.CountOfA
,tbl.CountOfB
,tbl.CountOfC
FROM
(
SELECT 1 AS Marker
,(SELECT COUNT(*) FROM #Test WHERE GroupingKey='a') AS CountOfA
,(SELECT COUNT(*) FROM #Test WHERE GroupingKey='b') AS CountOfB
,(SELECT COUNT(*) FROM #Test WHERE GroupingKey='c') AS CountOfC
WHERE (1=1) --play here with (1=0) and (1=1)
UNION ALL
SELECT 2,0,0,0
) AS tbl
ORDER BY Marker

Inline Table Join Multiplying Results

The below query joins two views and one inline table to another inline table. When I run the query without table FI all of the SUM values return correctly, however, when I run the query with table FI all of the SUM values from vw_Interactions are multiplied and returned incorrect (SUM values from vw_LeadInteractions are not affected).
vw_Interactions is a transactional log and returns a 1 in each column where that measure is true (ex: a 1 is returned in I.[Call] where a phone call was logged), and vw_LeadInteractions is the same except it returns the Client's ID.
I did several hours of research and found that inline tables can cause issues when joining (the Cartesian product?), however I wasn't able to understand how those answers were relevant to this query.
Can someone explain why that when table FI is included in this query that it multiplies the SUM values of everything from vw_Interactions? And then how do I fix my query so this does not happen?
This query is for my employer's outbound call center to measure what's happening during each 'round' of calling.
/* Parameters */
DECLARE #StartDatetime AS Date
SET #StartDatetime = '06/01/13'
DECLARE #EndDatetime AS Date
SET #EndDatetime = '05/31/14'
/* Dataset */
SELECT R.[RoundsGoal]
,R.[RoundNumber]
,COUNT(DISTINCT R.[Client_Id]) AS 'Leads'
,ISNULL(SUM(I.[Call]), 0) AS 'Calls'
,ISNULL(COUNT(DISTINCT LI.[Call]), 0) AS 'CallLeads'
,ISNULL(SUM(FI.[FirstCall]), 0) AS 'FirstCalls'
,ISNULL(SUM(I.[DecisionMakerCall]), 0) AS 'DecisionMakerCalls'
,ISNULL(COUNT(DISTINCT LI.[DecisionMakerCall]), 0) AS 'DecisionMakerCallLeads'
,ISNULL(SUM(FI.[FirstDecisionMakerCall]), 0) AS 'FirstDecisionMakerCalls'
,ISNULL(SUM( I.[LeftMessageCall]), 0) AS 'LeftMessageCalls'
,ISNULL(COUNT(DISTINCT LI.[LeftMessageCall]), 0) AS 'LeftMessageLeads'
,ISNULL(SUM(FI.[FirstLeftMessageCall]), 0) AS 'FirstLeftMessageCalls'
,ISNULL(SUM(I.[NoAnswerCall]), 0) AS 'NoAnswerCalls'
,ISNULL(COUNT(DISTINCT LI.[NoAnswerCall]), 0) AS 'NoAnswerCallLeads'
,ISNULL(SUM(FI.[FirstNoAnswerCall]), 0) AS 'FirstNoAnswerCalls'
FROM (
SELECT RD.[Client_Id]
,ISNULL(UF1.[NumericCol], 0) AS 'RoundsGoal'
,COUNT(RD.[RoundDate]) OVER(PARTITION BY RD.[Client_Id] ORDER BY RD.[RoundDate] ASC) AS 'RoundNumber'
,RD.[RoundDate]
FROM [dbo].[vw_RoundDates] RD
LEFT JOIN [dbo].[AMGR_User_Fields] UF1 ON RD.[Client_Id] = UF1.[Client_Id] AND UF1.[Type_Id] = 140 --Rounds Goal TypeId
LEFT JOIN [dbo].[AMGR_User_Field_Defs] UFD1 ON UF1.[Type_Id] = UFD1.[Type_Id] AND UF1.[Code_Id] = UFD1.[Code_Id]
WHERE RD.[RoundDate] >= #StartDatetime AND RD.[RoundDate] <= #EndDatetime
) R
LEFT JOIN [dbo].[vw_Interactions] I ON R.[Client_Id] = I.[Client_Id] AND R.[RoundDate] = CAST(I.[Created] AS DATE)
LEFT JOIN [dbo].[vw_LeadInteractions] LI ON R.[Client_Id] = LI.[Client_Id] AND R.[RoundDate] = CAST(LI.[Created] AS DATE)
LEFT JOIN (
SELECT I.[Client_Id]
,CASE WHEN (CASE WHEN I.[Call] = 1 THEN ROW_NUMBER() OVER(PARTITION BY I.[Client_Id], I.[Call] ORDER BY I.[Created] ASC) ELSE NULL END) = 1 THEN 1 ELSE NULL END AS 'FirstCall'
,CASE WHEN (CASE WHEN I.[DecisionMakerCall] = 1 THEN ROW_NUMBER() OVER(PARTITION BY I.[Client_Id], I.[DecisionMakerCall] ORDER BY I.[Created] ASC) ELSE NULL END) = 1 THEN 1 ELSE NULL END AS 'FirstDecisionMakerCall'
,CASE WHEN (CASE WHEN I.[LeftMessageCall] = 1 THEN ROW_NUMBER() OVER(PARTITION BY I.[Client_Id], I.[LeftMessageCall] ORDER BY I.[Created] ASC) ELSE NULL END) = 1 THEN 1 ELSE NULL END AS 'FirstLeftMessageCall'
,CASE WHEN (CASE WHEN I.[NoAnswerCall] = 1 THEN ROW_NUMBER() OVER(PARTITION BY I.[Client_Id], I.[NoAnswerCall] ORDER BY I.[Created] ASC) ELSE NULL END) = 1 THEN 1 ELSE NULL END AS 'FirstNoAnswerCall'
,[Created]
FROM [dbo].[vw_Interactions] I
) FI ON R.[Client_Id] = FI.[Client_Id] AND R.[RoundDate] = CAST(FI.[Created] AS DATE)
GROUP BY R.[RoundsGoal]
,R.[RoundNumber]
ORDER BY R.[RoundsGoal] ASC
,R.[RoundNumber] ASC
Here is the correct results set without table FI. Notice the Calls on row 23 equals 135,110.
Here is the incorrect results, that include table FI. Notice the Calls on row 23 are multiplied to 1,561,038.