Counting Distinct Months with Multiple records per month

Counting Distinct Months with Multiple records per month - sql

My data looks like this:
Code Date
123 1/2/2016
123 1/4/2016
123 1/4/2016
123 2/5/2016
456 1/2/2016
456 1/3/2016
456 2/7/2016
789 1/7/2016
789 1/8/2016
789 3/7/2016
789 3/15/2016
I am looking for a distinct count of months grouped by the code.
So the results would look something like this
Code Jan2016 Feb2016 Mar2016
123 1 1 0
456 1 1 0
789 1 0 1
I feel like I may be overcomplicating my code.
So far I have
SELECT
p.code
,SUM(CASE WHEN p.date BETWEEN '11/1/2010' AND '11/30/2010'
THEN 1 ELSE 0 END) AS 'Nov2010'
FROM table
Group By p.code
But that is pulling in all records from Nov2010, when I just need to know if this exists

You can just change your aggregate function. Use MAX instead of SUM.
SELECT
p.code
,MAX(CASE WHEN p.date BETWEEN '11/1/2010' AND '11/30/2010'
THEN 1 ELSE 0 END) AS 'Nov2010'
FROM table
Group By p.code

in SQL Server you can use a pivot table, bit messy but something like this will work, sample data:
declare #table table (code int, date date)
insert into #table values
(123, '1/2/2016'),
(123, '1/4/2016'),
(123, '1/4/2016'),
(123, '2/5/2016'),
(456, '1/2/2016'),
(456, '1/3/2016'),
(456, '2/7/2016'),
(789, '1/7/2016'),
(789, '1/8/2016'),
(789, '3/7/2016'),
(789, '3/15/2016')
then using a pivot table:
with cte (code) as (select distinct code from #table)
select
c1.code,
ISNULL(months.[1],0) as 'Jan 2016',
ISNULL(months.[2],0) as 'Feb 2016',
ISNULL(months.[3],0) as 'Mar 2016'
from
(
select
c.code,
count( distinct t.code) as [ID],
month(date) as [month]
from #table t
join cte c on t.code = c.code
group by c.code, month(date)
) P
pivot
(
sum([ID])
for [month] IN ("1","2","3")--,"4","5","6","7","8","9","10","11","12")
) as months
join cte c1 on months.code = c1.code
will give you the following results:
code Jan 2016 Feb 2016 Mar 2016
123 1 1 0
456 1 1 0
789 1 0 1
If you take the comment out after the 3rd month you can do it for the rest of the year

Related

Using EXISTS within a GROUP BY clause

Is it possible to do the following:
I have a table that looks like this:
declare #tran_TABLE TABLE(
EOMONTH DATE,
AccountNumber INT,
CLASSIFICATION_NAME VARCHAR(50),
Value Float
)
INSERT INTO #tran_TABLE VALUES('2018-11-30','123','cat1',10)
INSERT INTO #tran_TABLE VALUES('2018-11-30','123','cat1',15)
INSERT INTO #tran_TABLE VALUES('2018-11-30','123','cat1',5 )
INSERT INTO #tran_TABLE VALUES('2018-11-30','123','cat2',10)
INSERT INTO #tran_TABLE VALUES('2018-11-30','123','cat3',12)
INSERT INTO #tran_TABLE VALUES('2019-01-31','123','cat1',5 )
INSERT INTO #tran_TABLE VALUES('2019-01-31','123','cat2',10)
INSERT INTO #tran_TABLE VALUES('2019-01-31','123','cat2',15)
INSERT INTO #tran_TABLE VALUES('2019-01-31','123','cat3',5 )
INSERT INTO #tran_TABLE VALUES('2019-01-31','123','cat3',2 )
INSERT INTO #tran_TABLE VALUES('2019-03-31','123','cat1',15)
EOMONTH AccountNumber CLASSIFICATION_NAME Value
2018-11-30 123 cat1 10
2018-11-30 123 cat1 15
2018-11-30 123 cat1 5
2018-11-30 123 cat2 10
2018-11-30 123 cat3 12
2019-01-31 123 cat1 5
2019-01-31 123 cat2 10
2019-01-31 123 cat2 15
2019-01-31 123 cat3 5
2019-01-31 123 cat3 2
2019-03-31 123 cat1 15
I want to produce a result where it will check whether in each month, for each AccountNumber (just one in this case) there exists a CLASSIFICATION_NAME cat1, cat2, cat3.
If all 3 exist for the month, then return 1 but if any are missing return 0.
The result should look like:
EOMONTH AccountNumber CLASSIFICATION_NAME
2018-11-30 123 1
2019-01-31 123 1
2019-03-31 123 0
But I want to do it as compactly as possible, without first creating a table that groups everything by CLASSIFICATION_NAME, EOMONTH and AccountNumber and then selects from that table.
For example, in the pseudo code below, is it possible to use maybe an EXISTS statement to do the group by?
SELECT
EOMONTH
,AccountNumber
,CASE WHEN EXISTS (CLASSIFICATION_NAME = 'cat1' AND 'cat2' AND 'cat3') THEN 1 ELSE 0 end
,SUM(Value) AS totalSpend
FROM #tran_TABLE
GROUP BY
EOMONTH
,AccountNumber

You could emulate this behavior by counting the distinct classifications that answer this condition (per group):
SELECT
EOMONTH
,AccountNumber
,CASE COUNT(DISTINCT CASE WHEN classification_name IN ('cat1', 'cat2', 'cat3') THEN classification_name END)
WHEN 3 THEN 1
ELSE 0
END
,SUM(Value) AS totalSpend
FROM #tran_TABLE
GROUP BY
EOMONTH
,AccountNumber

Try this-
SELECT EOMONTH,
AccountNumber,
CASE
WHEN COUNT(DISTINCT CLASSIFICATION_NAME) = 3 THEN 1
ELSE 0
END CLASSIFICATION_NAME
FROM #tran_TABLE
GROUP BY EOMONTH,AccountNumber
Output is-
2018-11-30 123 1
2019-01-31 123 1
2019-03-31 123 0

Query like this. You can count distinct values.
When you count unique values then column 'Three_Unique_Cat'. When you count exactly 'cat1','cat2','cat3' then column 'Three_Cat1_Cat2_Cat3'
SELECT
EOMONTH, AccountNumber
,CASE WHEN
COUNT(DISTINCT CLASSIFICATION_NAME)=3 THEN 1
ELSE 0
END AS 'Three_Unique_Cat'
,CASE WHEN
COUNT(DISTINCT CASE WHEN CLASSIFICATION_NAME IN ('cat1','cat2','cat3')
THEN CLASSIFICATION_NAME ELSE NULL END)=3 THEN 1
ELSE 0
END AS 'Three_Cat1_Cat2_Cat3'
,SUM(Value) AS totalSpend
FROM #tran_TABLE
GROUP BY EOMONTH, AccountNumber
Output:
EOMONTH AccountNumber Three_Unique_Cat Three_Cat1_Cat2_Cat3 totalSpend
2018-11-30 123 1 1 52
2019-01-31 123 1 1 37
2019-03-31 123 0 0 15

It's easy, just as below:
select
EOMONTH,
AccountNumber,
case when count(distinct CLASSIFICATION_NAME) = 3 then 1 else 0 end as CLASSIFICATION_NAME
from
tran_TABLE
group by
EOMONTH,
AccountNumber

Get sum of entries over last 6 months (incomplete months)

My data looks something like this
ProductNumber | YearMonth | Number
1 201803 1
1 201804 3
1 201810 6
2 201807 -3
2 201809 5
Now what I want to have is add an additional entry "6MSum" which is the sum of the last 6 months per ProductNumber (not the last 6 entries).
Please be aware the YearMonth data is not complete, for every ProductNumber there are gaps in between so I cant just use the last 6 entries for the sum. The final result should look something like this.
ProductNumber | YearMonth | Number | 6MSum
1 201803 1 1
1 201804 3 4
1 201810 6 9
2 201807 -3 -3
2 201809 5 2
Additionally I don't want to insert the sum to the table but instead use it in a query like:
SELECT [ProductNumber],[YearMonth],[Number],
6MSum = CONVERT(INT,SUM...)
FROM ...
I found a lot off solutions that use a "sum over period" but only for the last X entries and not for the actual conditional statement of "YearMonth within last 6 months".
Any help would be much appreciated!
Its a SQL Database
EDIT/Answer
It seems to be the case that the gaps within the months have to be filled with data, afterwards something like
sum(Number) OVER (PARTITION BY category
ORDER BY year, week
ROWS 6 PRECEDING) AS 6MSum
Should work.
Reference to the solution : https://dba.stackexchange.com/questions/181773/sum-of-previous-n-number-of-columns-based-on-some-category

You could go the OUTER APPLY route. The following produces your required results exactly:
-- prep data
SELECT
ProductNumber , YearMonth , Number
into #t
FROM ( values
(1, 201803 , 1 ),
(1, 201804 , 3 ),
(1, 201810 , 6 ),
(2, 201807 , -3 ),
(2, 201809 , 5 )
) s (ProductNumber , YearMonth , Number)
-- output
SELECT
ProductNumber
,YearMonth
,Number
,[6MSum]
FROM #t t
outer apply (
SELECT
sum(number) as [6MSum]
FROM #t it
where
it.ProductNumber = t.ProductNumber
and it.yearmonth <= t.yearmonth
and t.yearmonth - it.yearmonth between 0 and 6
) tt
drop table #t

Use outer apply and convert yearmonth to a date, something like this:
with t as (
select t.*,
convert(date, convert(varchar(255), yearmonth) + '01')) as ymd
from yourtable t
)
select t.*, t2.sum_6m
from t outer apply
(select sum(t2.number) as sum_6m
from t t2
where t2.productnumber = t.productnumber and
t2.ymd <= t.ymd and
t2.ymd > dateadd(month, -6, ymd)
) t2;

Just to provide one more option. You can use DATEFROMPARTS to build valid dates from the YearMonth value and then search for values within date ranges.
Testable here: https://rextester.com/APJJ99843
SELECT
ProductNumber , YearMonth , Number
INTO #t
FROM ( values
(1, 201803 , 1 ),
(1, 201804 , 3 ),
(1, 201810 , 6 ),
(2, 201807 , -3 ),
(2, 201809 , 5 )
) s (ProductNumber , YearMonth , Number)
SELECT *
,[6MSum] = (SELECT SUM(number) FROM #t WHERE
ProductNumber = t.ProductNumber
AND DATEFROMPARTS(LEFT(YearMonth,4),RIGHT(YearMonth,2),1) --Build a valid start of month date
BETWEEN
DATEADD(MONTH,-6,DATEFROMPARTS(LEFT(t.YearMonth,4),RIGHT(t.YearMonth,2),1)) --Build a valid start of month date 6 months back
AND DATEFROMPARTS(LEFT(t.YearMonth,4),RIGHT(t.YearMonth,2),1)) --Build a valid end of month date
FROM #t t
DROP TABLE #t

So a working query (provided by a colleauge of mine) can look like this
SELECT [YearMonth]
,[Number]
,[ProductNumber]
, (Select Sum(Number) from [...] DPDS_1 where DPDS.ProductNumber =
DPDS_1.ProductNumber and DPDS_1.YearMonth <= DPDS.YearMonth and DPDS_1.YearMonth >=
convert (int, left (convert (varchar, dateadd(mm, -6, DPDS.YearMonth + '01'), 112),
6)))FROM [...] DPDS

SQL: CTE query Speed

I am using SQL Server 2008 and am trying to increase the speed of my query below. The query assigns points to patients based on readmission dates.
Example: A patient is seen on 1/2, 1/5, 1/7, 1/8, 1/9, 2/4. I want to first group visits within 3 days of each other. 1/2-5 are grouped, 1/7-9 are grouped. 1/5 is NOT grouped with 1/7 because 1/5's actual visit date is 1/2. 1/7 would receive 3 points because it is a readmit from 1/2. 2/4 would also receive 3 points because it is a readmit from 1/7. When the dates are grouped the first date is the actual visit date.
Most articles suggest limiting the data set or adding indexes to increase speed. I have limited the amount of rows to about 15,000 and added a index. When running the query with 45 test visit dates/ 3 test patients, the query takes 1.5 min to run. With my actual data set it takes > 8 hrs.
How can I get this query to run < 1 hr? Is there a better way to write my query? Does my Index look correct? Any help would be greatly appreciated.
Example expected results below query.
;CREATE TABLE RiskReadmits(MRN INT, VisitDate DATE, Category VARCHAR(15))
;CREATE CLUSTERED INDEX Risk_Readmits_Index ON RiskReadmits(VisitDate)
;INSERT RiskReadmits(MRN,VisitDate,CATEGORY)
VALUES
(1, '1/2/2016','Inpatient'),
(1, '1/5/2016','Inpatient'),
(1, '1/7/2016','Inpatient'),
(1, '1/8/2016','Inpatient'),
(1, '1/9/2016','Inpatient'),
(1, '2/4/2016','Inpatient'),
(1, '6/2/2016','Inpatient'),
(1, '6/3/2016','Inpatient'),
(1, '6/5/2016','Inpatient'),
(1, '6/6/2016','Inpatient'),
(1, '6/8/2016','Inpatient'),
(1, '7/1/2016','Inpatient'),
(1, '8/1/2016','Inpatient'),
(1, '8/4/2016','Inpatient'),
(1, '8/15/2016','Inpatient'),
(1, '8/18/2016','Inpatient'),
(1, '8/28/2016','Inpatient'),
(1, '10/12/2016','Inpatient'),
(1, '10/15/2016','Inpatient'),
(1, '11/17/2016','Inpatient'),
(1, '12/20/2016','Inpatient')
;WITH a AS (
SELECT
z1.VisitDate
, z1.MRN
, (SELECT MIN(VisitDate) FROM RiskReadmits WHERE VisitDate > DATEADD(day, 3, z1.VisitDate)) AS NextDay
FROM
RiskReadmits z1
WHERE
CATEGORY = 'Inpatient'
), a1 AS (
SELECT
MRN
, MIN(VisitDate) AS VisitDate
, MIN(NextDay) AS NextDay
FROM
a
GROUP BY
MRN
), b AS (
SELECT
VisitDate
, MRN
, NextDay
, 1 AS OrderRow
FROM
a1
UNION ALL
SELECT
a.VisitDate
, a.MRN
, a.NextDay
, b.OrderRow +1 AS OrderRow
FROM
a
JOIN b
ON a.VisitDate = b.NextDay
), c AS (
SELECT
MRN,
VisitDate
, (SELECT MAX(VisitDate) FROM b WHERE b1.VisitDate > VisitDate AND b.MRN = b1.MRN) AS PreviousVisitDate
FROM
b b1
)
SELECT distinct
c1.MRN,
c1.VisitDate
, CASE
WHEN DATEDIFF(day,c1.PreviousVisitDate,c1.VisitDate) < 30 THEN PreviousVisitDate
ELSE NULL
END AS ReAdmissionFrom
, CASE
WHEN DATEDIFF(day,c1.PreviousVisitDate,c1.VisitDate) < 30 THEN 3
ELSE 0
END AS Points
FROM
c c1
ORDER BY c1.MRN
Expected Results:
MRN VisitDate ReAdmissionFrom Points
1 2016-01-02 NULL 0
1 2016-01-07 2016-01-02 3
1 2016-02-04 2016-01-07 3
1 2016-06-02 NULL 0
1 2016-06-06 2016-06-02 3
1 2016-07-01 2016-06-06 3
1 2016-08-01 NULL 0
1 2016-08-15 2016-08-01 3
1 2016-08-28 2016-08-15 3
1 2016-10-12 NULL 0
1 2016-11-17 NULL 0
1 2016-12-20 NULL 0

oops I changed the names of a few cte's (and the post messed up what was code)
It should be like this:
b AS (
SELECT
VisitDate
, MRN
, NextDay
, 1 AS OrderRow
FROM
a1
UNION ALL
SELECT
a.VisitDate
, a.MRN
, a.NextDay
, b.OrderRow +1 AS OrderRow
FROM
a AS a
JOIN b
ON a.VisitDate = b.NextDay AND a.MRN = b.MRN
)

I'm going to take a wild guess here and say you want to change the b cte to
have AND a.MRN = b.MRN as a second condition in the second select query like this:
, b AS (
SELECT
VisitDate
, MRN
, NextDay
, 1 AS OrderRow
FROM
firstVisitAndFollowUp
UNION ALL
SELECT
a.VisitDate
, a.MRN
, a.NextDay
, b.OrderRow +1 AS OrderRow
FROM
visitsDistance3daysOrMore AS a
JOIN b
ON a.VisitDate = b.NextDay AND a.MRN = b.MRN
)

getting list for each employee for entire tenure

I have a table employee with the columns name, startdate, endate:
name | startdate | enddate
--------------------------------
A | 12/12/2012 | 12/12/2014
B | 05/08/2006 | 07/11/2009
I want result like this:
name | Year of Employee
-------------------------
A | 2012
A | 2013
A | 2014
B | 2006
B | 2007
B | 2008
B | 2009
Do I have to use a loop and/or cross join here?

Create a sample table
CREATE TABLE #test(name VARCHAR(10), startdate DATE, enddate DATE)
Inserting some sample data
INSERT INTO #test VALUES ('A', '12/12/2012', '12/12/2014')
,('B', '05/08/2006', '07/11/2009')
Using recursive CTE
;WITH CTE AS (
SELECT name, DATEPART(YEAR, startdate) AS yr FROM #test
UNION ALL
SELECT #test.name, yr + 1
FROM CTE
INNER JOIN #test ON #test.name = CTE.name
WHERE yr < DATEPART(YEAR, enddate)
)
SELECT name, yr AS [Year of Employee]
FROM CTE
ORDER BY name, yr
Output
name Year of Employee
A 2012
A 2013
A 2014
B 2006
B 2007
B 2008
B 2009

Define and populate a CalYear table that delivers this :
CalYearStartDate CalYearEndDate
2005-01-01 2005-12-12
2006-01-01 2006-12-12
2007-01-01 2007-12-12
2008-01-01 2008-12-12
2009-01-01 2009-12-12
2010-01-01 2010-12-12
2011-01-01 2011-12-12
2012-01-01 2012-12-12
2013-01-01 2013-12-12
2014-01-01 2014-12-12
Define and populate an Emp table that delivers this :
EmpName StartDate EndDate
A 2012-12-12 2014-12-12
B 2006-05-08 2009-07-11
Use this query to retreive results:
Select EmpName,
CalYearStartDate
From Emp
Inner
Join CalYear
on YEAR(CalYearStartDate) >= YEAR(Emp.StartDate)
and YEAR(CalYearEndDate) <= YEAR(Emp.EndDate)

Create statement:
CREATE TABLE #test(name VARCHAR(10), startdate DATE, enddate DATE)
INSERT INTO #test VALUES ('A', '12/12/2012', '12/12/2014')
,('B', '05/08/2006', '07/11/2009')
Query:
SELECT t.name, YEAR(DATEADD(year,n.number, t.startdate)) AS Year
FROM #test t,
(SELECT number
FROM master..spt_values
WHERE [type] = 'P') n
WHERE startdate <= DATEADD(year,n.number, t.startdate)
AND enddate >= DATEADD(year,n.number, t.startdate)
Result:
name Year
A 2012
A 2013
A 2014
B 2006
B 2007
B 2008
B 2009

Three column SQL PIVOT

How do I do a sql pivot of data that looks like this, USING the SQL PIVOT command ?
id | field | value
---------------------------------------
1 | year | 2011
1 | month | August
2 | year | 2009
1 | day | 21
2 | day | 31
2 | month | July
3 | year | 2010
3 | month | January
3 | day | NULL
Into something that looks like this:
id | year | month | day
-----------------------------
1 2011 August 21
2 2010 July 31
3 2009 January NULL

Try something like this:
DECLARE #myTable AS TABLE([ID] INT, [Field] VARCHAR(20), [Value] VARCHAR(20))
INSERT INTO #myTable VALUES ('1', 'year', '2011')
INSERT INTO #myTable VALUES ('1', 'month', 'August')
INSERT INTO #myTable VALUES ('2', 'year', '2009')
INSERT INTO #myTable VALUES ('1', 'day', '21')
INSERT INTO #myTable VALUES ('2', 'day', '31')
INSERT INTO #myTable VALUES ('2', 'month', 'July')
INSERT INTO #myTable VALUES ('3', 'year', '2010')
INSERT INTO #myTable VALUES ('3', 'month', 'January')
INSERT INTO #myTable VALUES ('3', 'day', NULL)
SELECT [ID], [year], [month], [day]
FROM
(
SELECT [ID], [Field], [Value] FROM #myTable
) t
PIVOT
(
MIN([Value]) FOR [Field] IN ([year], [month], [day])
) AS pvt
ORDER BY pvt.[year] DESC
Which will yield results of:
ID year month day
1 2011 August 21
3 2010 January NULL
2 2009 July 31

;WITH DATA(id,field,value) AS
(
SELECT 1,'year','2011' UNION ALL
SELECT 1,'month','August' UNION ALL
SELECT 2,'year','2009' UNION ALL
SELECT 1,'day ','21' UNION ALL
SELECT 2,'day ','31' UNION ALL
SELECT 2,'month','July' UNION ALL
SELECT 3,'year','2010' UNION ALL
SELECT 3,'month','January' UNION ALL
SELECT 3,'day ',NULL
)
SELECT id,
year,
month,
day
FROM DATA PIVOT (MAX(value) FOR field IN ([year], [month], [day])) AS Pvt

SELECT
id,
MAX(CASE WHEN RK=3 THEN VAL ELSE '' END) AS "YEAR",
MAX(CASE WHEN RK=2 THEN VAL ELSE '' END) AS "MONTH",
MAX(CASE WHEN RK=1 THEN VAL ELSE '' END) AS "DAY"
FROM
(
SELect
ID,
ROW_NUMBER() OVER(PARTITION BY ID ORDER BY YEAR1 ASC) RK,
VAL
FROM TEST3)A
GROUP BY 1
ORDER BY 1;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Counting Distinct Months with Multiple records per month - sql

You can just change your aggregate function. Use MAX instead of SUM. SELECT p.code ,MAX(CASE WHEN p.date BETWEEN '11/1/2010' AND '11/30/2010' THEN 1 ELSE 0 END) AS 'Nov2010' FROM table Group By p.code

Related

Using EXISTS within a GROUP BY clause

Get sum of entries over last 6 months (incomplete months)

SQL: CTE query Speed

getting list for each employee for entire tenure

Three column SQL PIVOT

Categories

Resources