Show Date Range in Custom Column - Gaps and Islands - sql

I have table that looks like this:
+------------+------+
| Date | Name |
+------------+------+
| 2017-01-07 | A |
| 2017-01-08 | A |
| 2017-01-09 | A |
| 2017-01-12 | A |
| 2017-01-07 | B |
| 2017-01-08 | B |
| 2017-01-09 | B |
+------------+------+
I would like to be able to turn it into the following:
+-------------------------+------+
| Date Range | Name |
+-------------------------+------+
| 2017-01-07 - 2017-01-09 | A |
| 2017-01-07 - 2017-01-09 | B |
| 2017-01-12 | A |
+-------------------------+------+
The code would find the minimum and maximum of consecutive dates only, group the results using the Name column and then list the minimum and maximum dates as a 'to and from' string in one column.
I'm having problems trying to list consecutive dates only. Note that the third entry above gets its own entry because it is not consecutive with the date range for 'A' in the earlier entry.
EDIT: Please note: This is specific to SQL Server 2008, which does not allow use of the LAG function.
EDIT 2:
The original answer supplied by McNets worked fine on SQL Server 2012. I've included it here as it's better if you have SQL Server 2012 onwards.
;WITH CalcDiffDays AS
(
SELECT Date, Name,
CONCAT (Name, CAST(DATEDIFF(DAY, LAG(Date, 1, Date - 1) OVER (PARTITION BY Name ORDER BY Name, Date), Date) AS VARCHAR(10))) AS NumDays
FROM #tmpTable
)
SELECT CONCAT(CONVERT(VARCHAR(20), MIN(Date), 102), ' - ', CONVERT(VARCHAR(20), MAX(Date), 102)) AS [Data Range], Name
FROM CalcDiffDays
GROUP BY NumDays, Name;

First I've added a row number to the whole table.
WITH RowN AS
(
SELECT Date, Name, ROW_NUMBER() OVER (ORDER BY Name, Date) RN
FROM #T
)
Then I've joined this table with itself just to calculate days between dates.
,CalcDiffDays AS
(
SELECT RowN.Date, RowN.Name,
ISLAND = RowN.Name +
CASE
WHEN RowN.RN > 1 AND RowN.Name = R2.Name THEN CAST(DATEDIFF(day, R2.Date, RowN.Date) AS VARCHAR(20))
ELSE '1'
END
FROM RowN
LEFT JOIN RowN R2 ON R2.RN = RowN.RN-1
)
GAPS. How many days between consecutive dates of the same name.
ISLANDS. By adding the name to the calculated days.
+---------------------+------+---------+
| Date | Name | NumDays |
+---------------------+------+---------+
| 07.01.2017 00:00:00 | A | A1 |
+---------------------+------+---------+
| 08.01.2017 00:00:00 | A | A1 |
+---------------------+------+---------+
| 09.01.2017 00:00:00 | A | A1 |
+---------------------+------+---------+
| 12.01.2017 00:00:00 | A | A3 |
+---------------------+------+---------+
| 07.01.2017 00:00:00 | B | B1 |
+---------------------+------+---------+
| 08.01.2017 00:00:00 | B | B1 |
+---------------------+------+---------+
| 09.01.2017 00:00:00 | B | B1 |
+---------------------+------+---------+
The second part: get the MIN and MAX Date of each island.
WITH RowN AS
(
SELECT Date, Name, ROW_NUMBER() OVER (ORDER BY Name, Date) RN
FROM #T
)
,CalcDiffDays AS
(
SELECT RowN.Date, RowN.Name,
ISLAND = RowN.Name +
CASE
WHEN RowN.RN > 1 AND RowN.Name = R2.Name THEN CAST(DATEDIFF(day, R2.Date, RowN.Date) AS VARCHAR(20))
ELSE '1'
END
FROM RowN
LEFT JOIN RowN R2 ON R2.RN = RowN.RN-1
)
SELECT CONVERT(VARCHAR(20), MIN(Date), 102) + ' - ' + CONVERT(VARCHAR(20), MAX(Date), 102) AS [Data Range], Name
FROM CalcDiffDays
GROUP BY ISLAND, Name
ORDER BY MIN(Date);
+-------------------------+------+
| Data Range | Name |
+-------------------------+------+
| 2017.01.07 - 2017.01.09 | A |
+-------------------------+------+
| 2017.01.07 - 2017.01.09 | B |
+-------------------------+------+
| 2017.01.12 - 2017.01.12 | A |
+-------------------------+------+
Can check it here: http://rextester.com/MHLEEJ50479

Related

SQL - get summary of differences vs previous month

I have a table similar to this one:
| id | store | BOMdate |
| 1 | A | 01/10/2018 |
| 1 | B | 01/10/2018 |
| 1 | C | 01/10/2018 |
|... | ... | ... |
| 1 | A | 01/11/2018 |
| 1 | C | 01/11/2018 |
| 1 | D | 01/11/2018 |
|... | ... | ... |
| 1 | B | 01/12/2018 |
| 1 | C | 01/12/2018 |
| 1 | E | 01/12/2018 |
It contains the stores that are active at BOM (beginning of month).
How do I query it to get the amount of stores that are new that month - those that where not active the previous month?
The output should be this:
| BOMdate | #newstores |
| 01/10/2018 | 3 | * no stores on previous month
| 01/11/2018 | 1 | * D is the only new active store
| 01/12/2018 | 2 | * store B was not active on November, E is new
I now how to count the first time that each store is active (nested select, taking the MIN(BOMdate) and then counting). But I have no idea how to check each month vs its previous month.
I use SQL Server, but I am interested in the differences in other platforms if there are any.
Thanks
How do I query it to get the amount of stores that are new that month - those that where not active the previous month?
One option uses not exists:
select bomdate, count(*) cnt_new_stores
from mytable t
where not exists (
select 1
from mytable t1
where t1.store = t.store and t1.bomdate = dateadd(month, -1, t.bomdate)
)
group by bomdate
You can also use window functions:
select bomdate, count(*) cnt_new_stores
from (
select t.*, lag(bomdate) over(partition by store order by bomdate) lag_bomdate
from mytable t
) t
where bomdate <> dateadd(month, 1, lag_bomdate) or lag_bomdate is null
group by bomdate
you can compare a date with previous month's date using DATEDIFF function of TSQL.
Using NOT EXIST you can count the stores which did not appear in last month as well you can get the names in a list using STRING_AGG function of TSQL introduced from SQL 2017.
select BOMDate, NewStoresCount=count(1),NewStores= STRING_AGG(store,',') from
yourtable
where not exists
(
Select 1 from
yourtable y where y.store=store and DATEDIFF(m,y.BOMDate,BOMDate)=1
)
group by BOMDate

Group by month, day, hour + gaps and islands problem

I need to calculate (in percentage) how long was status true during day, hours or month (working_time).
I simplify my table to this one:
| date | status |
|-------------------------- |-------- |
| 2018-11-05T19:04:21.125Z | true |
| 2018-11-05T19:04:22.125Z | true |
| 2018-11-05T19:04:23.125Z | true |
| 2018-11-05T19:04:24.125Z | false |
| 2018-11-05T19:04:25.125Z | true |
....
I need to get in result (depend on parameter) this one:
for hours:
| date | working_time |
|-------------------------- |--------------|
| 2018-11-05T00:00:00.000Z | 14 |
| 2018-11-05T01:00:00.000Z | 15 |
| 2018-11-05T02:00:00.000Z | 32 |
|... | ... |
| 2018-11-05T23:00:00.000Z | 13 |
for months:
| date | working_time |
|-------------------------- |--------------|
| 2018-01-01T00:00:00.000Z | 14 |
| 2018-02-01T00:00:00.000Z | 15 |
| 2018-03-01T00:00:00.000Z | 32 |
|... | ... |
| 2018-12-01T00:00:00.000Z | 13 |
My SQL query looks like this:
SELECT date_trunc('month', date) as date,
round((EXTRACT(epoch from sum(time_diff)) / 25920) :: numeric, 2) as working_time
FROM (SELECT date,
status as current_status,
(lag(status, 1) OVER (ORDER BY date)) AS previous_status,
(date -(lag(date, 1) OVER (ORDER BY date))) AS time_diff
FROM table
) as raw_data
WHERE current_status = TRUE AND previous_status = TRUE
GROUP BY date_trunc('month', date)
ORDER BY date;
and it works ok but really slow. Any idea about optimisation? Maybse using Row_Number() function?
Try this:
SELECT t.month_reference as date,
round( sum(if(t_aux.status,1,0)) / 25920) :: numeric, 2) as working_time
#I asume you use this number because is the uptime of the system 60*18*24,
#I would use this if I wanted the total seconds in the month 60*60*24*day(Last_day(t.month_reference))
FROM (SELECT date_trunc('month', t.date) as month_reference
FROM table
) as t
left join table t_aux
on t.month_reference=date_trunc('month', t_aux.date)
so when we group by month, the sum() will only find the rows that are true and have the referenced month
and t_aux.date <
(select t1.date
from table t1
where t.month_reference=date_trunc('month', t1.date)
and t1.status=false
order by t1.date asc limit 1 )
I add this so it only selects the rows that are true until it finds a row with status false in the same month reference
GROUP BY t.month_reference
ORDER BY t.month_reference;

Split the date of same column in multiple rows till the next date value is specified - SQL Server

I have this table
+------+------------+-----+
| Code | date | qty |
+------+------------+-----+
| 1 | 06-07-2017 | 44 |
| 1 | 08-07-2017 | 45 |
| 2 | 07-07-2017 | 32 |
| 2 | 09-07-2017 | 33 |
+------+------------+-----+
and I want to display it this way
+------+------------+-----+
| Code | date | qty |
+------+------------+-----+
| 1 | 06-07-2017 | 44 |
| 1 | 07-07-2017 | 44 |
| 1 | 08-07-2017 | 45 |
| 2 | 07-07-2017 | 32 |
| 2 | 08-07-2017 | 32 |
| 2 | 09-07-2017 | 33 |
+------+------------+-----+
I want to split the date of same 'Code' and keep the same value for 'qty' till the next date of same 'Code'.
You need a calendar table and Outer Apply
;WITH cte
AS (SELECT Min([date]) AS st,
Max([date]) ed,
code
FROM Yourtable
GROUP BY code
UNION ALL
SELECT Dateadd(dd, 1, st) AS st,
ed,
code
FROM cte
WHERE Dateadd(dd, 1, st) <= ed)
SELECT c.code,
[date]=c.st,
qty
FROM cte c
OUTER apply (SELECT TOP 1 qty
FROM Yourtable a
WHERE a.code = c.code
AND c.st >= a.[date]
ORDER BY [date] DESC) oa
ORDER BY c.code,st
Note : For the sake of completeness I have used Recursive CTE to generate the dates you can always create a physical calendar table in your database and use it.
Live Demo

Count concurrent dates in user-input date range using SQL

The user will input a date range, and I want to output in SQL every date between and including that range in the number of concurrent uses of said equipment.
In this example, the user date range is 03/08/2016 to 03/09/2016, so you can see below I include anything on or between those dates (grouped by category, but I've simplified here by only using 'powerchair')
The table schema is as follows;
trans_date | trans_end_date | eq_category
17/03/2016 | 16/10/2016 | POWERCHAIR
08/08/2016 | 08/08/2016 | POWERCHAIR
12/08/2016 | 12/08/2016 | POWERCHAIR
17/08/2016 | 18/08/2016 | POWERCHAIR
22/08/2016 | 22/08/2016 | POWERCHAIR
26/08/2016 | 26/08/2016 | POWERCHAIR
02/09/2016 | 02/09/2016 | POWERCHAIR
And I would like to output;
date | concurrent_use
03-08-2016 | 1
04-08-2016 | 1
05-08-2016 | 1
06-08-2016 | 1
07-08-2016 | 1
08-08-2016 | 2
09-08-2016 | 1
10-08-2016 | 1
11-08-2016 | 1
12-08-2016 | 2
13-08-2016 | 1
14-08-2016 | 1
15-08-2016 | 1
16-08-2016 | 1
17-08-2016 | 2
18-08-2016 | 2
19-08-2016 | 1
20-08-2016 | 1
21-08-2016 | 1
22-08-2016 | 2
23-08-2016 | 1
24-08-2016 | 1
25-08-2016 | 1
26-08-2016 | 2
27-08-2016 | 1
28-08-2016 | 1
29-08-2016 | 1
30-08-2016 | 1
31-08-2016 | 1
01-09-2016 | 1
02-09-2016 | 2
03-09-2016 | 1
Anything 1 or 0, I can then filter out as there mustn't have been any equipment out concurrently that day.
I don't think this is a gaps/islands problem, but I'm drawing a blank trying to get this in an SQL statement.
Try like below. You need to generate dates using recursive cte. Then we need to count the no of occurrences of each date falling in range.
;WITH CTE
AS (SELECT CONVERT(DATE, '2016-08-03', 103) DATE1
UNION ALL
SELECT Dateadd(DAY, 1, DATE1) AS DATE1
FROM CTE
WHERE Dateadd(DD, 1, DATE1) <= '2016-09-03')
SELECT C.DATE1,
Count(1) OCCURENCES
FROM CTE C
JOIN #TABLE1 T
ON C.DATE1 BETWEEN [TRANS_DATE] AN [TRANS_END_DATE]
GROUP BY C.DATE1
You need a set of numbers or dates. So, if you want everything in that range:
with d as (
select cast('2016-08-03' as date) as d
union all
select dateadd(day, 1, d.d)
from d
where d < '2016-09-03'
)
select d.d, count(s.trans_date)
from d left join
schema s
on d.d between s.trans_date and s.trans_date_end
group by d.d;
I'm not sure if both the start and end dates are included in the range.

Making Row Entries Pair Horizontally in SQL

So this question is similar to one I've asked before, but slightly different.
I'm looking at data for clients who are admitted to and discharged from a program. For each admit and discharge they have an assessment done and are scored on it and sometimes they are admitted and discharged multiple times during a time period.
I need to be able to pair each clients admit score with their following discharge date so I can look at all clients who improved a certain amount from admit to discharge for each of their admits and discharges.
This is an dummy sample of how my data results are formatted right now:
And this is how I'd ideally like it formatted:
But I'd take any point in the right direction or similar formatting help that would allow me to be able to compare all of the instances of admit and discharge scores for all the clients.
Thanks!
In order to get the result, you can apply both the UNPIVOT and the PIVOT functions. The UNPIVOT will convert your multiple columns of date and score into rows, then you can pivot those rows back into columns.
Then unpivot syntax will be similar to this:
select person,
casenumber,
ScoreType+'_'+col col,
value,
rn
from
(
select person,
casenumber,
convert(varchar(10), date, 101) date,
cast(score as varchar(10)) score,
scoreType,
row_number() over(partition by casenumber, scoretype
order by case scoretype when 'Admit' then 1 end, date) rn
from yourtable
) d
unpivot
(
value
for col in (date, score)
) unpiv
See SQL Fiddle with Demo. This gives a result:
| PERSON | CASENUMBER | COL | VALUE | RN |
-----------------------------------------------------------
| Jon | 3412 | Discharge_date | 01/03/2013 | 1 |
| Jon | 3412 | Discharge_score | 12 | 1 |
| Al | 3452 | Admit_date | 05/16/2013 | 1 |
| Al | 3452 | Admit_score | 15 | 1 |
| Al | 3452 | Discharge_date | 08/01/2013 | 1 |
| Al | 3452 | Discharge_score | 13 | 1 |
As you can see this query also creates the new columns to then pivot. So the final code will be:
select person, casenumber,
Admit_Date, Admit_Score, Discharge_Date, Discharge_Score
from
(
select person,
casenumber,
ScoreType+'_'+col col,
value,
rn
from
(
select person,
casenumber,
convert(varchar(10), date, 101) date,
cast(score as varchar(10)) score,
scoreType,
row_number() over(partition by casenumber, scoretype
order by case scoretype when 'Admit' then 1 end, date) rn
from yourtable
) d
unpivot
(
value
for col in (date, score)
) unpiv
) src
pivot
(
max(value)
for col in (Admit_Date, Admit_Score, Discharge_Date, Discharge_Score)
) piv;
See SQL Fiddle with Demo. This gives a result:
| PERSON | CASENUMBER | ADMIT_DATE | ADMIT_SCORE | DISCHARGE_DATE | DISCHARGE_SCORE |
-------------------------------------------------------------------------------------
| Al | 3452 | 05/16/2013 | 15 | 08/01/2013 | 13 |
| Cindy | 6578 | 01/02/2013 | 17 | 03/04/2013 | 14 |
| Cindy | 6578 | 03/04/2013 | 14 | 03/18/2013 | 12 |
| Jon | 3412 | (null) | (null) | 01/03/2013 | 12 |
| Kevin | 9868 | 01/18/2013 | 19 | 03/02/2013 | 15 |
| Kevin | 9868 | 03/02/2013 | 15 | (null) | (null) |
| Pete | 4765 | 02/06/2013 | 15 | (null) | (null) |
| Susan | 5421 | 04/06/2013 | 19 | 05/07/2013 | 15 |
SELECT
ad.person, ad.CaseNumber, ad.Date as AdmitScoreDate, ad.Score as AdmitScore,
dis.date as DischargeScoreDate, dis.Score as DischargeScore
From
yourTable ad, yourTable dis
WHERE
ad.person=dis.person
and
ad.ScoreType='Admit'
and d
is.ScoreType='Discharge';
If all the columns you mentioned are in the same table, you can join on same table
SELECT t1.person,
t1.caseNumber,
t1.date adate,
t1.score ascore,
t1.scoreType ascoreType,
t2.date ddate,
t2.score dscore,
t2.scoreType dscoretype
FROM patient t1
join patient t2
on t1.casenumber=t2.casenumber
and t1.scoreType!=t2.scoreType
and t1.scoreType='Admit'
But this will not show you record of people who have been admitted and not discharged yet. I don't know if you were also looking for that information.
SQL Fiddle link
Hope this helps!