getting a distinct count from with a date field

getting a distinct count from with a date field - sql

I have a piece of code that is looking for the distinct count of Kegs, the count of the distinct kegs that are tagged and ones that are untagged, what I have so far is:
with CTE as
(select UID_KEG, IS_TAGGED, movement_date
from MOVEMENT M
inner join Keg on M.UID_Keg = Keg.Unique_ID
where DATEPART(year,Movement_date) = '2019'
and UID_MOVEMENT_TYPE = 1
)
select COUNT(Distinct CTE.UID_KEG) as 'Kegs', datepart(week,movement_date)
as 'Week number',
SUM(case when Is_Tagged = 1 then 1 end) as 'tagged',
SUM(case when Is_Tagged = 0 then 1 end) as 'untagged'
from CTE
group by datepart(week,movement_date)
order by [Week number] asc
It currectly returns a distinct count of the kegs but the figures for tagged and un tagged are incorrect and I can only assume it because it's counting duplicate kegs.
Can any one advise how I can get round this or do a count on just the distinct kegs?

You want conditional aggregation using COUNT(DISTINCT). That would be:
SELECT COUNT(DISTINCT CTE.UID_KEG) as Kegs,
datepart(week, movement_date) as Week_number,
COUNT(DISTINCT CASE WHEN Is_Tagged = 1 THEN CTE.UID_KEG END) as tagged,
COUNT(DISTINCT(CASE WHEN Is_Tagged = 0 THEN CTE.UID_KEG END) as untagged
FROM CTE
GROUP BY datepart(week, movement_date)
ORDER BY MIN(movement_date);
Notes:
The tagged and untagged counts may still add up to more than the total count, assuming that kegs can be both tagged and untagged in a single week.
You should include the year() as well as the week, especially because you are not selecting data from a single year.
Only use single quotes for string and date constants. Do not use them for column aliases; that can lead to hard-to-debug errors.

If you remove the Distinct from your count, the sum of untapped and tapped should equal your total (if it is a binary 0 or 1). This indicates that you have duplicate UID_KEG values. Take some time to understand why. Part of your problem is that it seems you don't quite understand the shape of your dataset very well.
Take some time to look at the data to understand if there are duplicates (why? are they caused by the join, or are they in the base data?), look to see if they can appear as tagged and untagged.
EDIT: In response to your comment. If they can be scanned twice you will have to have the assumption that if Is_Tagged = 1 for any UID_KEG in that day, then all kegs with that UID_KEG are tagged.
In that case you will have to adapt the code to use this assumption.
WITH CTE
AS (
SELECT UID_KEG
,IS_TAGGED
,movement_date
FROM MOVEMENT M
INNER JOIN Keg ON M.UID_Keg = Keg.Unique_ID
WHERE DATEPART(year, Movement_date) = '2019'
AND UID_MOVEMENT_TYPE = 1
)
SELECT CTE.UID_KEG AS 'Kegs'
,datepart(week, movement_date) AS 'Week number'
,MAX(Is_Tagged) AS 'tagged'
FROM CTE
GROUP BY CTE.UID_KEG
,datepart(week, movement_date)
ORDER BY [Week number] ASC
This code might not be perfect, I couldn't test it, but it should get you a complete list of each keg, in each day, and if that keg was marked as tagged at least once, and if it was not marked as tagged at all.
The most important thing here is removing duplication of the kegs within each day, then it is possible to calculate.
I'm not great with CTE's but you will need to aggregate one level up to the daily level, now you will be able to count the distinct number of kegs and which ones were tagged and untagged.
Hope that makes sense.
EDIT: here is a subquery that should work
SELECT [Week number]
,count(1) [numKegs]
,sum(tagged) [numTagged]
FROM (
SELECT UID_KEG AS 'Kegs'
,datepart(week, movement_date) AS 'Week number'
,MAX(IS_TAGGED) AS 'tagged'
FROM MOVEMENT M
INNER JOIN Keg ON M.UID_Keg = Keg.Unique_ID
WHERE DATEPART(year, Movement_date) = '2019'
AND UID_MOVEMENT_TYPE = 1
GROUP BY UID_KEG
,datepart(week, movement_date)
) kegdailylevel
GROUP BY [Week number]
ORDER BY [Week number] ASC

Related

SQL with as expression shows multiple results

I am writing a SQL query using with as expression. I always get a result in the square of what I required.
This is my query:
DECLARE #MAX_DATE AS INT
SET #MAX_DATE = (SELECT DATEPART(MONTH,FECHA) FROM ALBVENTACAB WHERE NUMALBARAN IN (SELECT DISTINCT MAX(NUMALBARAN) FROM ALBVENTACAB));
;WITH TABLE_LAST AS (
SELECT CONCAT(DATEPART(MONTH,FECHA),'-',DATEPART(YEAR,FECHA)) as LAST_YEAR_MONTH
,SUM(TOTALNETO) AS LAST_YEAR_VALUE
FROM ALBVENTACAB
WHERE DATEPART(YEAR,CURRENT_TIMESTAMP) -1 = DATEPART(YEAR,FECHA) AND NUMSERIE LIKE 'A%'
AND DATEPART(MONTH,FECHA) <= #MAX_DATE
GROUP BY CONCAT(DATEPART(MONTH,FECHA),'-',DATEPART(YEAR,FECHA))
)
,TABLE_CURRENT AS(
SELECT CONCAT(DATEPART(MONTH,FECHA),'-',DATEPART(YEAR,FECHA)) as CURR_YEAR_MONTH
,SUM(TOTALNETO) AS CURR_YEAR_VALUE
FROM ALBVENTACAB
WHERE DATEPART(YEAR,CURRENT_TIMESTAMP) <= DATEPART(YEAR,FECHA) AND NUMSERIE LIKE 'A%'
GROUP BY CONCAT(DATEPART(MONTH,FECHA),'-',DATEPART(YEAR,FECHA))
)
SELECT *
FROM TABLE_CURRENT, TABLE_LAST
When I run the query I get exactly the square of the result.
I want to compare sale monthly with last year.
2-2020 814053.3 2-2019 840295.1
1-2020 1094993.65 2-2019 840295.1
3-2020 293927.3 2-2019 840295.1
2-2020 814053.3 1-2019 1050701.68
1-2020 1094993.65 1-2019 1050701.68
3-2020 293927.3 1-2019 1050701.68
2-2020 814053.3 3-2019 887776.1
1-2020 1094993.65 3-2019 887776.1
3-2020 293927.3 3-2019 887776.1
I should get only 3 rows instead of 9 rows.

You need to properly join your two CTE - the way you're doing it now, you're getting a Cartesian product of each row in either CTE together.
Do something like:
*;WITH TABLE_LAST AS
( ....
),
TABLE_CURRENT AS
( ....
)
SELECT *
FROM TABLE_CURRENT curr
INNER JOIN TABLE_LAST last ON (some join condition here)
What that join condition is going to be - I have no idea, and cannot tell from your question - but you have to define how these two sets of data "connect" ....
It could be something like:
SELECT *
FROM TABLE_CURRENT curr
INNER JOIN TABLE_LAST last ON curr.CURR_YEAR_MONTH = last.LAST_YEAR_MONT
or whatever else makes sense in your situation - but basically, you need to somehow "tie together" these two sets of data and get only those rows that make sense - not just every row from "last" combined with every row from "curr" ....

While you already got the answer on how to join the two results, I thought I'd tell you how to typically approach such problems.
From the same table, you want two sums on different conditions (different years that is). You solve this with conditional aggregation, which does just that: aggregate (sum) based on a condition (year).
select
datepart(month, fecha) as month,
sum(case when datepart(year, fecha) = datepart(year, getdate()) then totalneto end) as this_year,
sum(case when datepart(year, fecha) = datepart(year, getdate()) -1 then totalneto end) as last_year
from albventacab
where numserie like 'A%'
and fecha > dateadd(year, -2, getdate())
group by datepart(month, fecha)
order by datepart(month, fecha);

SQL - Grouping by Last Day of Quarter

I currently have a query running to average survey scores for agents. We use the date range of the LastDayOfTheQuarter and 180 days back to calculate these scores. I ran into an issue for this current quarter.
One of my agents hasn't received any surveys in 2020 which is causing the query to not pull the current lastdayofquarter and 180 days back of results.
The code I am using:
SELECT
Agent,
U.Position,
U.BranchDescription,
(ADDDATE(LastDayOfQuarter, -180)) AS MinDate,
(LastDayOfQuarter) AS MaxDate,
COUNT(DISTINCT Response ID) as SurveyCount,
AVG(CASE WHEN Question ID = Q1_2 THEN Answer Value END) AS EngagedScore,
AVG(CASE WHEN Question ID = Q1_3 THEN Answer Value END) AS KnowledgableScore,
AVG(CASE WHEN Question ID = Q1_6 THEN Answer Value END) AS ValuedScore
FROM qualtrics_responses
LEFT JOIN date D
ON (D.`Date`) = (DATE(`End Date`))
LEFT JOIN `users` U
ON U.`UserID` = `Agent ID`
WHERE `Agent` IS NOT NULL
AND DATE(`End Date`) <= (`LastDayOfQuarter`)
AND DATE(`End Date`) >= (ADDDATE(`LastDayOfQuarter`, -180))
GROUP BY `Agent`, (ADDDATE(`LastDayOfQuarter`, -180))
i know the issue is due to the way I am joining the dates and since he doesn't have a result in this current year, the end date to date join isn't grabbing the desired date range. I can't seem to come up with any alternatives. Any help is appreciated.

I make the assumption that table date in your query is a calendar table, that stores the starts and ends of the quarters (most likely with one row per date in the quarter).
If so, you can solve this problem by rearranging the joins: first cross join the users and the calendar table to generate all possible combinations, then bring in the surveys table with a left join:
SELECT
U.UserID,
U.Position,
U.BranchDescription,
D.LastDayOfQuarter - interval 180 day AS MinDate,
D.LastDayOfQuarter AS MaxDate,
COUNT(DISTINCT Q.ResponseID) as SurveyCount,
AVG(CASE WHEN Q.QuestionID = 'Q1_2' THEN Q.Answer Value END) AS EngagedScore,
AVG(CASE WHEN Q.QuestionID = 'Q1_3' THEN Q.Answer Value END) AS KnowledgableScore,
AVG(CASE WHEN Q.QuestionID = 'Q1_6' THEN Q.Answer Value END) AS ValuedScore
FROM date D
CROSS JOIN users U
LEFT JOIN qualtrics_responses Q
ON Q.EndDate >= D.Date
AND Q.EndDate < D.Date + interval 1 day
AND U.UserID = Q.AgentID
AND Q.Agent IS NOT NULL
GROUP BY
U.UserID,
U.Position,
U.BranchDescription,
D.LastDayOfQuarter
Notes:
I adapted the date arithmetics - this assumes that you are using MySQL, as the syntax of the query suggests
You should really qualify all the columns in the query, by prefixing them with the alias of the table they belong to; this makes the query so much easier to understand. I gave a tried at it, you might need to review that.
All non-aggregated columns should appear in the group by clause (also see the comment from Eric); this is a a requirement in most databaseses, and good practice anywhere

SQL Query - how to create a sum based on grouped numbers

I'd like to know how many messages (voicemailcount) were received for each rank (b.rankcode). Example Column 'b.rankcode' will have x number of people with a rank number who have gotten messages in 'voicemailcount'. If possible, I'd only like to see the voicemailcount and rankcode column.
select
count(*) as [voicemailcount], a.linkedusermailboxname, b.RankCode as 'Rank'
from
UMCallDataRecord a
join
UMADUserAccounts b on a.LinkedUserMailboxName = b.EmailAddress
where
a.CallType = 'callansweringvoicemessage'
and month(a.[date]) = month(GetDate()) - 1 -- change this per month -1 = lastmonth
group by
a.linkedusermailboxname, b.rankcode
order by
b.rankcode
TIA!

I believe that we need to remove "linkedusermailboxname" from the Select and from the GroupBy, and remove the Join table and its On... >>
SELECT
b.rankcode AS 'Rank'
,Count(*) AS [voicemailcount]
FROM umcalldatarecord a
WHERE a.calltype = 'callansweringvoicemessage'
AND Month(a.[date]) = Month(Getdate()) - 1
-- change this per month -1 = lastmonth
GROUP BY b.rankcode
ORDER BY b.rankcode

Seems like you just need to reduce the number of columns that you group by
SELECT
COUNT(*) AS [voicemailcount]
, ua.RankCode AS 'Rank'
FROM UMCallDataRecord AS cdr
JOIN UMADUserAccounts AS ua ON cdr.LinkedUserMailboxName = ua.EmailAddress
WHERE cdr.CallType = 'callansweringvoicemessage'
AND cdr.[date] >= dateadd(month, datediff(month,0, getDate())-1, 0) -- 1st day of last month
AND cdr.[date] < dateadd(month, datediff(month,0, getDate()), 0) -- 1st day of current month
GROUP BY
ua.rankcode
ORDER BY
ua.rankcode
However I would like to propose a different way to define the date range because you could include results from any year in your current approach. The approach above will limit the results to everything from the previous month, and only the previous month. Note also I do not use between which just isn't great for date ranges.
I also concur with "Bad habits to kick : using table aliases like (a, b, c) or (t1, t2, t3)", use meaningful aliases, not aliases based on "sequence in a query"

Getting percentages of counts in SQL Server

I am building an SQL Server query that gets the number of leads that were generated from a certain sources by month. This is the query that tells me the monthly count. But I want to add a column that shows what those leads are for that month as a total of all leads for that month. I'm not clear on how to do this. Any help?
SELECT FORMAT([ProspectData].[dbo].[Real Estate.KPC.Leads.2018-08-08].[Created Date]
, 'yyyy-MM') AS 'YYYY-MM'
, 'Kiosk-Mall' AS 'Lead Source'
, COUNT(*) AS 'Monthly Total From That Lead Source'
FROM [ProspectData].[dbo].[Real Estate.KPC.Leads.2018-08-08]
WHERE [ProspectData].[dbo].[Real Estate.KPC.Leads.2018-08-08].[Lead Source] =
'Kiosk-Mall'
GROUP BY FORMAT([ProspectData].[dbo].[Real Estate.KPC.Leads.2018-08-08].[Created Date], 'yyyy-MM')
ORDER BY FORMAT([ProspectData].[dbo].[Real Estate.KPC.Leads.2018-08-08].[Created Date], 'yyyy-MM');

You can use conditional aggregation -- basically moving the WHERE condition to a CASE expressions in the argument to an aggregation function:
SELECT FORMAT(l.[Created Date], 'yyyy-MM') AS YYYYMM,
'Kiosk-Mall' AS Lead_Source,
SUM(CASE WHEN l.[Lead Source] = 'Kiosk-Mall' THEN 1 ELSE 0 END) AS [Monthly Total From That Lead Source],
AVG(CASE WHEN l.[Lead Source] = 'Kiosk-Mall' THEN 1.0 ELSE 0 END) AS proportion_of_total
FROM [ProspectData].[dbo].[Real Estate.KPC.Leads.2018-08-08] l
GROUP BY FORMAT(l.[Created Date], 'yyyy-MM')
ORDER BY YYYYMM
Notes:
Table aliases make the query easier to write and to read.
It is better to choose column aliases that do not need to be escaped (i.e. no spaces, no punctuation).

Sql Server - Joining subqueries using calculated fields

I am trying to calculate the percentage change in price between days. As the days are not consectutive, I build into the query a calculated field that tells me what relative day it is (day 1, day 2, etc). In order to compare today with yesterday, I offset the calculated day number by 1 in a subquery. what I want to do is to join the inner and outer query on the calculated relative day. The code I came up with is:
SELECT TOP 11
P.Date,
(AVG(P.SettlementPri) - PriceY) / PriceY as PriceChange,
P.Symbol,
(RANK() OVER (ORDER BY P.Date desc)) as dayrank_Today
FROM OTE P
JOIN (SELECT TOP 11
C.Date,
AVG(SettlementPri) as PriceY,
(RANK() OVER (ORDER BY C.Date desc))+1 as dayrank_Yest
FROM OTE C
WHERE C.ComCode = 'C-'
GROUP BY c.Date) C ON dayrank_Today = C.dayrank_Yest
WHERE P.ComCode = 'C-'
GROUP BY P.Symbol, P.Date
If I try and execute the query, I get an erro message indicating dayrank_Today is an invalid column. I have tried renaming it, qualifying it, yell obsenities at it and I get squat. Still an error.

You can't do a select of a calculated column, and then use it in a join. You can use CTEs, which I'm not so familiar with, or you can jsut do table selects like so:
SELECT
P.Date,
(AVG(AvgPrice) - C.PriceY) / C.PriceY as PriceChange,
P.Symbol,
P.dayrank_Today FROM
(SELECT TOP 11
ComCode,
Date,
AVG(SettlementPri) as AvgPrice,
Symbol,
(RANK() OVER (ORDER BY Date desc)) as dayrank_Today
FROM OTE WHERE ComCode = 'C-') P
JOIN (SELECT TOP 11
C.Date,
AVG(SettlementPri) as PriceY,
(RANK() OVER (ORDER BY C.Date desc))+1 as dayrank_Yest
FROM OTE C
WHERE C.ComCode = 'C-'
GROUP BY c.Date) C ON dayrank_Today = C.dayrank_Yest
GROUP BY P.Symbol, P.Date

If possible consider using a CTE as it makes it very easy. Something like this:
With Raw as
(
SELECT TOP 11 C.Date,
Avg(SettlementPri) As PriceY,
Rank() OVER (ORDER BY C.Date desc) as dayrank
FROM OTE C WHERE C.Comcode = 'C-'
Group by C.Date
)
select today.pricey as todayprice ,
yesterday.pricey as yesterdayprice,
(today.pricey - yesterday.pricey)/today.pricey * 100 as percentchange
from Raw today
left outer join Raw yesterday on today.dayrank = yesterday.dayrank + 1
Obviously this doesn;t include the symbol but that can be included pretty easily.
If using 'With' syntax doesn;t suit you can also use calculated fields with Outer Apply http://technet.microsoft.com/en-us/library/ms175156.aspx
Although the CTE will mean that you only need to write your price calculation once which is a lot cleaner
Cheers

I had the same problem and found this thread and found a solution so I thought I'd post it here.
Instead of using the column name as parameter for ON, copy the statement that gave you the colmun name in the first place:
replace:
ON dayrank_Today = C.dayrank_Yest
with:
ON (RANK() OVER (ORDER BY Date desc)) = C.dayrank_Yest
Granted, you're displeasing the Programming Gods by violating DRY, but you could be pragmatic and mention the duplication in the comments, which should appease their wrath to a mild grumbling.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

getting a distinct count from with a date field - sql

Related

SQL with as expression shows multiple results

SQL - Grouping by Last Day of Quarter

SQL Query - how to create a sum based on grouped numbers

Getting percentages of counts in SQL Server

Sql Server - Joining subqueries using calculated fields

Categories

Resources