I am working in MS SQL Server 2017 on counting member ED visits where a member may go months between ER visits or may have multiple consecutive days each with a visit. The rule that I am trying to calculate is this:
If a member has more than one ED visit in an 8-day period, include
only the first eligible ED visit. For example, if a member has an
eligible ED visit on January 1, include the January 1 visit and do not
include ED visits that occur on or between January 2 and January 8.
Then, if applicable, include the next eligible ED visit that occurs on
or after January 9. Identify visits chronologically, including only
one visit per 8-day period.
If the days since the last visit is NULL or >= 8 then that is always counted and I have that. The issue I am having is how to look at the running total of Days_Since_Last_Visit to find the next valid visit when there are multiple in a given 8 day period.
In the example below the rows marked in green are flagged for inclusion because the Days_Since_Last_Visit is NULL or >= 8.
The rows highlighted yellow are the ones that should be counted as the first valid visit in the next 8 day period. The bold outline shows the days that are adding up to reach the threshold of 8.
Example data with highlighted entries that should be counted
I have prepared the SQL for the example in the image hoping someone can help me get unstuck.
IF OBJECT_ID('tempdb..#Example') IS NOT NULL
DROP TABLE #Example
CREATE TABLE
#Example (
Subscriber_ID VARCHAR(16),
Member_Seq VARCHAR(2),
Measurement_Year INT,
Visit_Date DATETIME,
)
INSERT INTO #Example (Subscriber_ID,Member_Seq,Measurement_Year,Visit_Date) VALUES
('788768646','02','2019','2019-07-09'),
('788768646','02','2019','2019-08-05'),
('788768646','02','2019','2019-08-18'),
('788768646','02','2019','2019-09-13'),
('788768646','02','2019','2019-09-15'),
('788768646','02','2019','2019-09-19'),
('788768646','02','2019','2019-09-25'),
('788768646','02','2019','2019-10-14'),
('788768646','02','2019','2019-10-21'),
('788768646','02','2019','2019-10-24'),
('788768646','02','2019','2019-10-27'),
('788768646','02','2019','2019-10-28'),
('788768646','02','2019','2019-11-03'),
('788768646','02','2019','2019-11-06'),
('788768646','02','2019','2019-11-18'),
('788768646','02','2019','2019-12-11')
SELECT y.Subscriber_ID,
y.Member_Seq,
y.Measurement_Year,
y.Visit_Date,
y.Prior_Visit_Date,
y.Days_Since_Last_Visit,
CASE
WHEN Days_Since_Last_Visit >= 8 OR Days_Since_Last_Visit IS NULL THEN
'Y'
ELSE
NULL
END Include_Visist,
CASE
WHEN Days_Since_Last_Visit >= 8 OR Days_Since_Last_Visit IS NULL THEN
NULL
ELSE
SUM (CASE
WHEN Days_Since_Last_Visit >= 8 OR Days_Since_Last_Visit IS NULL THEN
NULL
ELSE
y.Days_Since_Last_Visit
END
) OVER (PARTITION BY y.Subscriber_ID, y.Member_Seq,y.Measurement_Year ORDER BY y.Visit_Date)
END Running_Total
FROM (
SELECT x.Subscriber_ID,
x.Member_Seq,
x.Measurement_Year,
x.Visit_Date,
LAG(Visit_Date) OVER (
PARTITION BY x.Subscriber_ID, x.Member_Seq, x.Measurement_Year
ORDER BY x.Visit_Date) Prior_Visit_Date,
DATEDIFF(DAY,
LAG(Visit_Date) OVER (
PARTITION BY x.Subscriber_ID, x.Member_Seq, x.Measurement_Year
ORDER BY x.Visit_Date),
x.Visit_Date) Days_Since_Last_Visit
FROM #Example x
) y
This was challenging. I added some additional test records to make sure it handled a long string of visits close together over several periods.
WITH Example as (
SELECT Subscriber_ID, Member_Seq, Measurement_Year, CAST(Visit_Date as datetime) as [Visit_Date]
FROM (
VALUES
('788768646','02','2019','2019-06-01'),
('788768646','02','2019','2019-06-09'),
('788768646','02','2019','2019-07-09'),
('788768646','02','2019','2019-08-05'),
('788768646','02','2019','2019-08-18'),
('788768646','02','2019','2019-09-13'),
('788768646','02','2019','2019-09-15'),
('788768646','02','2019','2019-09-19'),
('788768646','02','2019','2019-09-25'),
('788768646','02','2019','2019-10-14'),
('788768646','02','2019','2019-10-21'),
('788768646','02','2019','2019-10-24'),
('788768646','02','2019','2019-10-27'),
('788768646','02','2019','2019-10-28'),
('788768646','02','2019','2019-11-03'),
('788768646','02','2019','2019-11-06'),
('788768646','02','2019','2019-11-18'),
('788768646','02','2019','2019-12-11'),
('788768646','02','2020','2020-01-01'),
('788768646','02','2020','2020-01-08'),
('788768646','02','2020','2020-01-09'),
('788768646','02','2020','2020-01-16'),
('788768646','02','2020','2020-01-17'),
('788768646','02','2020','2020-01-24'),
('788768646','02','2020','2020-01-25'),
('788768699','02','2019','2019-06-06'),
('788768699','02','2019','2019-06-07'),
('788768699','02','2019','2019-07-17'),
('788768699','02','2019','2019-08-23')
) t (Subscriber_ID, Member_Seq, Measurement_Year, Visit_Date)
), AllVisits as (
SELECT *,
ROW_NUMBER() OVER(PARTITION BY Subscriber_ID ORDER BY Visit_Date) as [VisitSeq]
FROM Example
), Report as (
SELECT *, Visit_Date as [PeriodStart]
FROM AllVisits
WHERE VisitSeq = 1
UNION ALL
SELECT a.*,
-- if not in the prior period, start a new 8 day period
CASE WHEN a.Visit_Date > r.PeriodStart + 7 THEN a.Visit_Date ELSE r.PeriodStart END
FROM AllVisits a
INNER JOIN Report r
ON r.Subscriber_ID = a.Subscriber_ID AND r.[VisitSeq] + 1 = a.[VisitSeq]
)
select *,
PeriodStart + 7 AS [PeriodEnD],
CASE WHEN Visit_Date = PeriodStart THEN 1 ELSE 0 END as [IsFirstDayOfPeriod]
from Report
where Visit_Date = PeriodStart
order by Subscriber_ID, Visit_Date
The query below gets most of desired records, but not all of them. (This is where it misses a long string of close visits.) I wanted to start here, but but I could not use a subquery or a group by in a recursive part of the CTE. I would have to do something like the above, but the anchor would need both the sequence start and finish. Then recursion for each anchor record would be within it's range. I might try it someday.
SELECT *
FROM Example e
WHERE NOT EXISTS( -- those with no prior within 8 days
SELECT *
FROM Example x
WHERE x.Subscriber_ID = e.Subscriber_ID
AND x.Visit_Date < e.Visit_Date -- prior
AND x.Visit_Date > e.Visit_Date - 8 -- within 8 days
)
Related
I have a table on the database that contains statuses updated on each vehicle I have, I want to calculate how many days each vehicle spends time between two specific statuses 'Maintenance' and 'Read'.
My table looks something like this
and I want to result to be like this, only show the number of days a vehicle spends in maintenance before becoming ready on a specific day
The code I written looks like this
drop table if exists #temps1
select
VehicleId,
json_value(VehiclesHistoryStatusID.text,'$.en') as VehiclesHistoryStatus,
VehiclesHistory.CreationTime,
datediff(day, VehiclesHistory.CreationTime ,
lead(VehiclesHistory.CreationTime ) over (order by VehiclesHistory.CreationTime ) ) as days,
lag(json_value(VehiclesHistoryStatusID.text,'$.en')) over (order by VehiclesHistory.CreationTime) as PrevStatus,
case
when (lag(json_value(VehiclesHistoryStatusID.text,'$.en')) over (order by VehiclesHistory.CreationTime) <> json_value(VehiclesHistoryStatusID.text,'$.en')) THEN datediff(day, VehiclesHistory.CreationTime , (lag(VehiclesHistory.CreationTime ) over (order by VehiclesHistory.CreationTime ))) else 0 end as testing
into #temps1
from fleet.VehicleHistory VehiclesHistory
left join Fleet.Lookups as VehiclesHistoryStatusID on VehiclesHistoryStatusID.Id = VehiclesHistory.StatusId
where (year(VehiclesHistory.CreationTime) > 2021 and (VehiclesHistory.StatusId = 140 Or VehiclesHistory.StatusId = 144) )
group by VehiclesHistory.VehicleId ,VehiclesHistory.CreationTime , VehiclesHistoryStatusID.text
order by VehicleId desc
drop table if exists #temps2
select * into #temps2 from #temps1 where testing <> 0
select * from #temps2
Try this
SELECT innerQ.VehichleID,innerQ.CreationDate,innerQ.Status
,SUM(DATEDIFF(DAY,innerQ.PrevMaintenance,innerQ.CreationDate)) AS DayDuration
FROM
(
SELECT t1.VehichleID,t1.CreationDate,t1.Status,
(SELECT top(1) t2.CreationDate FROM dbo.Test t2
WHERE t1.VehichleID=t2.VehichleID
AND t2.CreationDate<t1.CreationDate
AND t2.Status='Maintenance'
ORDER BY t2.CreationDate Desc) AS PrevMaintenance
FROM
dbo.Test t1 WHERE t1.Status='Ready'
) innerQ
WHERE innerQ.PrevMaintenance IS NOT NULL
GROUP BY innerQ.VehichleID,innerQ.CreationDate,innerQ.Status
In this query first we are finding the most recent 'maintenance' date before each 'ready' date in the inner most query (if exists). Then calculate the time span with DATEDIFF and sum all this spans for each vehicle.
I'm trying hard to extract the data in the format I need, but unsuccessful til now.
I have the following table
id_ticket, date_ticket, office_ticket, status_ticket
I need the query to return me, for EVERY MONTH, and always for the same OFFICE:
the number of tickets (COUNT) with any status
the number of tickets (COUNT) with status = 5
the number of tickets (COUNT) with status = 6
Month
Year
The query I made to return ONLY the total amount of tickets with any status was this. It worked!
SELECT
COUNT (id_ticket) as TotalTicketsPerMonth,
'sYear' = YEAR (date_ticket),
'sMonth' = MONTH (date_ticket)
FROM crm_vw_Tickets
WHERE office_ticket = 1
GROUP BY
YEAR (date_ticket), MONTH (date_ticket)
ORDER BY sYear ASC, sMonth ASC
Returning the total amount of ticket with status=5
SELECT
COUNT (id_ticket) as TotalTicketsPerMonth,
'sYear' = YEAR (date_ticket),
'sMonth' = MONTH (date_ticket)
FROM crm_vw_Tickets
WHERE office_ticket = 1 AND status_ticket = 5
GROUP BY
YEAR (date_ticket), MONTH (date_ticket)
ORDER BY sYear ASC, sMonth ASC
But I need the return to be something like:
Year Month Total Status5 Status6
2018 1 15 5 3
2018 2 14 4 5
2018 3 19 2 8
Thank you for your help.
You are close. You can use a CASE Expression to get what you need:
SELECT
COUNT (id_ticket) as TotalTicketsPerMonth,
SUM(CASE WHEN status_ticket = 5 THEN 1 END) as Status5,
SUM(CASE WHEN status_ticket = 6 THEN 1 END) as Status6,
'sYear' = YEAR (date_ticket),
'sMonth' = MONTH (date_ticket)
FROM crm_vw_Tickets
WHERE office_ticket = 1
GROUP BY YEAR (date_ticket), MONTH (date_ticket)
ORDER BY sYear ASC, sMonth ASC
The following code builds off JNevill's answer to include summary rows for "missing" months, i.e. those with no tickets, as well as months with tickets. The basic idea is to create a table of all of the months from the first to the last ticket, outer join the ticket data with the months and then summarize the data. (Tally table, numbers table and calendar table are more or less applicable terms.)
It is a Common Table Expression (CTE) that contains several queries that work step-by-step toward the result. You can see the results of the intermediate steps by replacing the final select statement with one of the ones commented out above it.
-- Sample data.
declare #crm_vw_Tickets as Table ( id_ticket Int Identity, date_ticket Date, office_ticket Int, status_ticket Int );
insert into #crm_vw_Tickets ( date_ticket, office_ticket, status_ticket ) values
( '20190305', 1, 6 ), -- Shrove Tuesday.
( '20190501', 1, 5 ), -- May Day.
( '20190525', 1, 5 ); -- Towel Day.
select * from #crm_vw_Tickets;
-- Summarize the data.
with
-- Get the minimum and maximum ticket dates for office_ticket 1.
Limits as (
select Min( date_ticket ) as MinDateTicket, Max( date_ticket ) as MaxDateTicket
from #crm_vw_Tickets
where office_ticket = 1 ),
-- 0 to 9.
Ten ( Number ) as ( select * from ( values (0), (1), (2), (3), (4), (5), (6), (7), (8), (9) ) as Digits( Number ) ),
-- 100 rows.
TenUp2 ( Number ) as ( select 42 from Ten as L cross join Ten as R ),
-- 10000 rows. We'll assume that 10,000 months should cover the reporting range.
TenUp4 ( Number ) as ( select 42 from TenUp2 as L cross join TenUp2 as R ),
-- 1 to the number of months to summarize.
Numbers ( Number ) as ( select top ( select DateDiff( month, MinDateTicket, MaxDateTicket ) + 1 from Limits ) Row_Number() over ( order by ( select NULL ) ) from TenUp4 ),
-- Starting date of each month to summarize.
Months as (
select DateAdd( month, N.Number - 1, DateAdd( day, 1 - Day( L.MinDateTicket ), L.MinDateTicket ) ) as StartOfMonth
from Limits as L cross join
Numbers as N ),
-- All tickets assigned to the appropriate month and a row with NULL ticket data
-- for each month without tickets.
MonthsAndTickets as (
select M.StartOfMonth, T.*
from Months as M left outer join
#crm_vw_Tickets as T on M.StartOfMonth <= T.date_ticket and T.date_ticket < DateAdd( month, 1, M.StartOfMonth ) )
-- Use one of the following select statements to see the intermediate or final results:
--select * from Limits;
--select * from Ten;
--select * from TenUp2;
--select * from TenUp4;
--select * from Numbers;
--select * from Months;
--select * from MonthsAndTickets;
select Year( StartOfMonth ) as SummaryYear, Month( StartOfMonth ) as SummaryMonth,
Count( id_ticket ) as TotalTickets,
Coalesce( Sum( case when status_ticket = 5 then 1 end ), 0 ) as Status5Tickets,
Coalesce( Sum( case when status_ticket = 6 then 1 end ), 0 ) as Status6Tickets
from MonthsAndTickets
where office_ticket = 1 or office_ticket is NULL -- Handle months with no tickets.
group by StartOfMonth
order by StartOfMonth;
Note that the final select uses Count( id_ticket ), Coalesce and an explicit check for NULL to produce appropriate output values (0) for months with no tickets.
I'm not sure if this has been asked before because I'm having trouble even asking it myself. I think the best way to explain my dilemma is to use an example.
Say I've rated my happiness on a scale of 1-10 every day for 10 years and I have the results in a big table where I have a single date correspond to a single integer value of my happiness rating. I say, though, that I only care about my happiness over 60 day periods on average (this may seem weird but this is a simplified example). So I wrap up this information to a table where I now have a start date field, an end date field, and an average rating field where the start days are every day from the first day to the last over all 10 years, but the end dates are exactly 60 days later. To be clear, these 60 day periods are overlapping (one would share 59 days with the next one, 58 with the next, and so on).
Next I pick a threshold rating, say 5, where I want to categorize everything below it into a "bad" category and everything above into a "good" category. I could easily add another field and use a case structure to give every 60-day range a "good" or "bad" flag.
Then to sum it up, I want to display the total periods of "good" and "bad" from maximum beginning to maximum end date. This is where I'm stuck. I could group by the good/bad category and then just take min(start date) and max(end date), but then if, say, the ranges go from good to bad to good then to bad again, output would show overlapping ranges of good and bad. In the aforementioned situation, I would want to show four different ranges.
I realize this may seem clearer to me that it would to someone else so if you need clarification just ask.
Thank you
---EDIT---
Here's an example of what the before would look like:
StartDate| EndDate| MoodRating
------------+------------+------------
1/1/1991 |3/1/1991 | 7
1/2/1991 |3/2/1991 | 7
1/3/1991 |3/3/1991 | 4
1/4/1991 |3/4/1991 | 4
1/5/1991 |3/5/1991 | 7
1/6/1991 |3/6/1991 | 7
1/7/1991 |3/7/1991 | 4
1/8/1991 |3/8/1991 | 4
1/9/1991 |3/9/1991 | 4
And the after:
MinStart| MaxEnd | Good/Bad
-----------+------------+----------
1/1/1991|3/2/1991 |good
1/3/1991|3/4/1991 |bad
1/5/1991|3/6/1991 |good
1/7/1991|3/9/1991 |bad
Currently my query with the group by rating would show:
MinStart| MaxEnd | Good/Bad
-----------+------------+----------
1/1/1991|3/6/1991 |good
1/3/1991|3/9/1991 |bad
This is something along the lines of
select min(StartDate), max(EndDate), Good_Bad
from sourcetable
group by Good_Bad
While Jason A Long's answer may be correct - I can't read it or figure it out, so I figured I would post my own answer. Assuming that this isn't a process that you're going to be constantly running, the CURSOR's performance hit shouldn't matter. But (at least to me) this solution is very readable and can be easily modified.
In a nutshell - we insert the first record from your source table into our results table. Next, we grab the next record and see if the mood score is the same as the previous record. If it is, we simply update the previous record's end date with the current record's end date (extending the range). If not, we insert a new record. Rinse, repeat. Simple.
Here is your setup and some sample data:
DECLARE #MoodRanges TABLE (StartDate DATE, EndDate DATE, MoodRating int)
INSERT INTO #MoodRanges
VALUES
('1/1/1991','3/1/1991', 7),
('1/2/1991','3/2/1991', 7),
('1/3/1991','3/3/1991', 4),
('1/4/1991','3/4/1991', 4),
('1/5/1991','3/5/1991', 7),
('1/6/1991','3/6/1991', 7),
('1/7/1991','3/7/1991', 4),
('1/8/1991','3/8/1991', 4),
('1/9/1991','3/9/1991', 4)
Next, we can create a table to store our results, as well as some variable placeholders for our cursor:
DECLARE #MoodResults TABLE(ID INT IDENTITY(1, 1), StartDate DATE, EndDate DATE, MoodScore varchar(50))
DECLARE #CurrentStartDate DATE, #CurrentEndDate DATE, #CurrentMoodScore INT,
#PreviousStartDate DATE, #PreviousEndDate DATE, #PreviousMoodScore INT
Now we put all of the sample data into our CURSOR:
DECLARE MoodCursor CURSOR FOR
SELECT StartDate, EndDate, MoodRating
FROM #MoodRanges
OPEN MoodCursor
FETCH NEXT FROM MoodCursor INTO #CurrentStartDate, #CurrentEndDate, #CurrentMoodScore
WHILE ##FETCH_STATUS = 0
BEGIN
IF #PreviousStartDate IS NOT NULL
BEGIN
IF (#PreviousMoodScore >= 5 AND #CurrentMoodScore >= 5)
OR (#PreviousMoodScore < 5 AND #CurrentMoodScore < 5)
BEGIN
UPDATE #MoodResults
SET EndDate = #CurrentEndDate
WHERE ID = (SELECT MAX(ID) FROM #MoodResults)
END
ELSE
BEGIN
INSERT INTO
#MoodResults
VALUES
(#CurrentStartDate, #CurrentEndDate, CASE WHEN #CurrentMoodScore >= 5 THEN 'GOOD' ELSE 'BAD' END)
END
END
ELSE
BEGIN
INSERT INTO
#MoodResults
VALUES
(#CurrentStartDate, #CurrentEndDate, CASE WHEN #CurrentMoodScore >= 5 THEN 'GOOD' ELSE 'BAD' END)
END
SET #PreviousStartDate = #CurrentStartDate
SET #PreviousEndDate = #CurrentEndDate
SET #PreviousMoodScore = #CurrentMoodScore
FETCH NEXT FROM MoodCursor INTO #CurrentStartDate, #CurrentEndDate, #CurrentMoodScore
END
CLOSE MoodCursor
DEALLOCATE MoodCursor
And here are the results:
SELECT * FROM #MoodResults
ID StartDate EndDate MoodScore
----------- ---------- ---------- --------------------------------------------------
1 1991-01-01 1991-03-02 GOOD
2 1991-01-03 1991-03-04 BAD
3 1991-01-05 1991-03-06 GOOD
4 1991-01-07 1991-03-09 BAD
Is this what you're looking for?
IF OBJECT_ID('tempdb..#MyDailyMood', 'U') IS NOT NULL
DROP TABLE #MyDailyMood;
CREATE TABLE #MyDailyMood (
TheDate DATE NOT NULL,
MoodLevel INT NOT NULL
);
WITH
cte_n1 (n) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (n)),
cte_n2 (n) AS (SELECT 1 FROM cte_n1 a CROSS JOIN cte_n1 b),
cte_n3 (n) AS (SELECT 1 FROM cte_n2 a CROSS JOIN cte_n2 b),
cte_Calendar (dt) AS (
SELECT TOP (DATEDIFF(dd, '2007-01-01', '2017-01-01'))
DATEADD(dd, ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) - 1, '2007-01-01')
FROM
cte_n3 a CROSS JOIN cte_n3 b
)
INSERT #MyDailyMood (TheDate, MoodLevel)
SELECT
c.dt,
ABS(CHECKSUM(NEWID()) % 10) + 1
FROM
cte_Calendar c;
--==========================================================
WITH
cte_AddRN AS (
SELECT
*,
RN = ISNULL(NULLIF(ROW_NUMBER() OVER (ORDER BY mdm.TheDate) % 60, 0), 60)
FROM
#MyDailyMood mdm
),
cte_AssignGroups AS (
SELECT
*,
DateGroup = DENSE_RANK() OVER (PARTITION BY arn.RN ORDER BY arn.TheDate)
FROM
cte_AddRN arn
)
SELECT
BegOfRange = MIN(ag.TheDate),
EndOfRange = MAX(ag.TheDate),
AverageMoodLevel = AVG(ag.MoodLevel),
CASE WHEN AVG(ag.MoodLevel) >= 5 THEN 'Good' ELSE 'Bad' END
FROM
cte_AssignGroups ag
GROUP BY
ag.DateGroup;
Post OP update solution...
WITH
cte_AddRN AS ( -- Add a row number to each row that resets to 1 ever 60 rows.
SELECT
*,
RN = ISNULL(NULLIF(ROW_NUMBER() OVER (ORDER BY mdm.TheDate) % 60, 0), 60)
FROM
#MyDailyMood mdm
),
cte_AssignGroups AS ( -- Use DENSE_RANK to create groups based on the RN added above.
-- How it works: RN set the row number 1 - 60 then repeats itself
-- but we dont want ever 60th row grouped together. We want blocks of 60 consecutive rows grouped together
-- DENSE_RANK accompolishes this by ranking within all the "1's", "2's"... and so on.
-- verify with the following query... SELECT * FROM cte_AssignGroups ag ORDER BY ag.TheDate
SELECT
*,
DateGroup = DENSE_RANK() OVER (PARTITION BY arn.RN ORDER BY arn.TheDate)
FROM
cte_AddRN arn
),
cte_AggRange AS ( -- This is just a straight forward aggregation/rollup. It produces the results similar to the sample data you posed in your edit.
SELECT
BegOfRange = MIN(ag.TheDate),
EndOfRange = MAX(ag.TheDate),
AverageMoodLevel = AVG(ag.MoodLevel),
GorB = CASE WHEN AVG(ag.MoodLevel) >= 5 THEN 'Good' ELSE 'Bad' END,
ag.DateGroup
FROM
cte_AssignGroups ag
GROUP BY
ag.DateGroup
),
cte_CompactGroup AS ( -- This time we're using dense rank to group all of the consecutive "Good" and "Bad" values so that they can be further aggregated below.
SELECT
ar.BegOfRange, ar.EndOfRange, ar.AverageMoodLevel, ar.GorB, ar.DateGroup,
DenseGroup = ar.DateGroup - DENSE_RANK() OVER (PARTITION BY ar.GorB ORDER BY ar.BegOfRange)
FROM
cte_AggRange ar
)
-- The final aggregation step...
SELECT
BegOfRange = MIN(cg.BegOfRange),
EndOfRange = MAX(cg.EndOfRange),
cg.GorB
FROM
cte_CompactGroup cg
GROUP BY
cg.DenseGroup,
cg.GorB
ORDER BY
BegOfRange;
fellow developers and analysts. I have some experience in SQL and have resorted to similar posts. However, this is slightly more niche. Thank you in advance for helping.
I have the below dataset (edited. Apology)
Setup
CREATE TABLE CustomerPoints
(
CustomerID INT,
[Date] Date,
Points INT
)
INSERT INTO CustomerPoints
VALUES
(1, '20150101', 500),
(1, '20150201', -400),
(1, '20151101', 300),
(1, '20151201', -400)
and need to turn it into (edited. The figures in previous table were incorrect)
Any positive amount of points are points earned whereas negative are redeemed. Because of the FIFO (1st in 1st out concept), of the second batch of points spent (-400), 100 of those were taken from points earned on 20150101 (UK format) and 300 from 20151101.
The goal is to calculate, for each customer, the number of points spent within x and y months of earning. Again, thank you for your help.
I have already answered a similar question here and here
You need to explode points earned and redeemed by single units and then couple them, so each point earned will be matched by a redeemed point.
For each of these matching rows calculate the months elapsed from the earning to the redeeming and then aggregate it all.
For FN_NUMBERS(n) it is a tally table, look at other answers I have linked above.
;with
p as (select * from CustomerPoints),
e as (select * from p where points>0),
r as (select * from p where points<0),
ex as (
select *, ROW_NUMBER() over (partition by CustomerID order by [date] ) rn
from e
join FN_NUMBERS(1000) on N<= e.points
),
rx as (
select *, ROW_NUMBER() over (partition by CustomerID order by [date] ) rn
from r
join FN_NUMBERS(1000) on N<= -r.points
),
j as (
select ex.CustomerID, DATEDIFF(month,ex.date, rx.date) mm
from ex
join rx on ex.CustomerID = rx.CustomerID and ex.rn = rx.rn and rx.date>ex.date
)
-- use this select to see points redeemed in current and past semester
select * from j join (select 0 s union all select 1 s ) p on j.mm >= (p.s*6)+(p.s) and j.mm < p.s*6+6 pivot (count(mm) for s in ([0],[2])) p order by 1, 2
-- use this select to see points redeemed with months detail
--select * from j pivot (count(mm) for mm in ([0],[1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12])) p order by 1
-- use this select to see points redeemed in rows per month
--select CustomerID, mm, COUNT(mm) PointsRedeemed from j group by CustomerID, mm order by 1
output of default query, 0 is 0-6 months, 1 is 7-12 (age of redemption in months)
CustomerID 0 1
1 700 100
output of 2nd query, 0..12 is the age of redemption in months
CustomerID 0 1 2 3 4 5 6 7 8 9 10 11 12
1 0 700 0 0 0 0 0 0 0 0 0 100 0
output from 3rd query, is the age of redemption in months
CustomerID mm PointsRedeemed
1 1 700
1 11 100
bye
Given a table structured like that:
id | news_id(fkey)| status | date
1 10 PUBLISHED 2016-01-10
2 20 UNPUBLISHED 2016-01-10
3 10 UNPUBLISHED 2016-01-12
4 10 PUBLISHED 2016-01-15
5 10 UNPUBLISHED 2016-01-16
6 20 PUBLISHED 2016-01-18
7 10 PUBLISHED 2016-01-18
8 20 UNPUBLISHED 2016-01-20
9 30 PUBLISHED 2016-01-20
10 30 UNPUBLISHED 2016-01-21
I'd like to count distinct news that, in given period time, had first and last status equal(and also status equal to given in query)
So, for this table query from 2016-01-01 to 2016-02-01 would return:
1 (with WHERE status = 'PUBLISHED') because news_id 10 had PUBLISHED in both first( 2016-01-10 ) and last row (2016-01-18)
1 (with WHERE status = 'UNPUBLISHED' because news_id 20 had UNPUBLISHED in both first and last row
notice how news_id = 30 does not appear in results, as his first/last statuses were contrary.
I have done that using following query:
SELECT count(*) FROM
(
SELECT DISTINCT ON (news_id)
news_id, status as first_status
FROM news_events
where date >= '2015-11-12 15:01:56.195'
ORDER BY news_id, date
) first
JOIN (
SELECT DISTINCT ON (news_id)
news_id, status as last_status
FROM news_events
where date >= '2015-11-12 15:01:56.195'
ORDER BY news_id, date DESC
) last
using (news_id)
where first_status = last_status
and first_status = 'PUBLISHED'
Now, I have to transform query into SQL our internal Java framework, unfortunately it does not support subqueries, except when using EXISTS or NOT EXISTS. I was told to transform the query to one using EXISTS clause(if it is possible) or try finding another solution. I am, however, clueless. Could anyone help me do that?
edit: As I am being told right now, the problem lies not with our framework, but in Hibernate - if I understood correctly, "you cannot join an inner select in HQL" (?)
Not sure if this adresses you problem correctly, since it is more of a workaround. But considering the following:
News need to be published before they can be "unpublished". So if you'd add 1 for each "published" and substract 1 for each "unpublished" your balance will be positive (or 1 to be exact) if first and last is "published". It will be 0 if you have as many unpublished as published and negative, if it has more unpublished than published (which logically cannot be the case but obviously might arise, since you set a date threshhold in the query where a 'published' might be occured before).
You might use this query to find out:
SELECT SUM(CASE status WHEN 'PUBLISHED' THEN 1 ELSE -1 END) AS 'publishbalance'
FROM news_events
WHERE date >= '2015-11-12 15:01:56.195'
GROUP BY news_id
First of all, subqueries are a substantial part of SQL. A framework forbidding their use is a bad framework.
However, "first" and "last" can be expressed with NOT EXISTS: where not exists an earlier or later entry for the same news_id and date range.
select count(*)
from mytable first
join mytable last on last.news_id = first.news_id
where date between #from and #to
and not exists
(
select *
from mytable before_first
where before_first.news_id = first.news_id
and before_first.date < first.date
and before_first.date >= #from
)
and not exists
(
select *
from mytable after_last
where after_last.news_id = last.news_id
and after_last.date > last.date
and after_last.date <= #to
)
and first.status = #status
and last.status = #status;
NOT EXISTS to the rescue:
SELECT ff.id ,ff.news_id ,ff.status , ff.zdate AS startdate
, ll.zdate AS enddate
FROM newsflash ff
JOIN newsflash ll
ON ff.news_id = ll.news_id
AND ff.status = ll.status
AND ff.zdate < ll.zdate
AND NOT EXISTS (
SELECT * FROM newsflash nx
WHERE nx.news_id = ff.news_id
AND nx.zdate >= '2016-01-01' AND nx.zdate < '2016-02-01'
AND (nx.zdate < ff.zdate OR nx.zdate > ll.zdate)
)
ORDER BY ff.id
;