Following Start and End Date Columns - sql

I have start and end date columns, and there are some where the start date equals the end date of the previous row without a gap. I'm trying to get it so that it would basically go from the Start Date row who's End Date is null and kinda "zig-zag" up going until the Start Date does not match the End Date.
I've tried CTEs, and ROW_NUMBER() OVER().
START_DTE END_DTE
2018-01-17 2018-01-19
2018-01-26 2018-02-22
2018-02-22 2018-08-24
2018-08-24 2018-09-24
2018-09-24 NULL
Expected:
START_DTE END_DTE
2018-01-26 2018-09-24
EDIT
Using a proposed solution with an added CTE to ensure dates don't have times with them.
WITH
CTE_TABLE_NAME AS
(
SELECT
ID_NUM,
CONVERT(DATE,START_DTE) START_DTE,
CONVERT(DATE,END_DTE) END_DTE
FROM
TABLE_NAME
WHERE ID_NUM = 123
)
select min(start_dte) as start_dte, max(end_dte) as end_dte, grp
from (select t.*,
sum(case when prev_end_dte = end_dte then 0 else 1 end) over (order by start_dte) as grp
from (select t.*,
lag(end_dte) over (order by start_dte) as prev_end_dte
from CTE_TABLE_NAME t
) t
) t
group by grp;
The following query provides these results:
start_dte end_dte grp
2014-08-24 2014-12-19 1
2014-08-31 2014-09-02 2
2014-09-02 2014-09-18 3
2014-09-18 2014-11-03 4
2014-11-18 2014-12-09 5
2014-12-09 2015-01-16 6
2015-01-30 2015-02-02 7
2015-02-02 2015-05-15 8
2015-05-15 2015-07-08 9
2015-07-08 2015-07-09 10
2015-07-09 2015-08-25 11
2015-08-31 2015-09-01 12
2015-10-06 2015-10-29 13
2015-11-10 2015-12-11 14
2015-12-11 2015-12-15 15
2015-12-15 2016-01-20 16
2016-01-29 2016-02-01 17
2016-02-01 2016-03-03 18
2016-03-30 2016-08-29 19
2016-08-30 2016-12-06 20
2017-01-27 2017-02-20 21
2017-02-20 2017-08-15 22
2017-08-15 2017-08-29 23
2017-08-29 2018-01-17 24
2018-01-17 2018-01-19 25
2018-01-26 2018-02-22 26
2018-02-22 2018-08-24 27
2018-08-24 2018-09-24 28
2018-09-24 NULL 29
I tried using having count (*) > 1 as suggested, but it provided no results
Expected example
START_DTE END_DTE
2017-01-27 2018-01-17
2018-01-26 2018-09-24

You can identify where groups of connected rows start by looking for where adjacent rows are not connected. A cumulative sum of these starts then gives you the groups.
select min(start_dte) as start_dte, max(end_dte) as end_dte
from (select t.*,
sum(case when prev_end_dte = start_dte then 0 else 1 end) over (order by start_dte) as grp
from (select t.*,
lag(end_dte) over (order by start_dte) as prev_end_dte
from t
) t
) t
group by grp;
If you want only multiply connected rows (as implied by your question), then add having count(*) > 1 to the outer query.
Here is a db<>fiddle.

Related

Get quarter start/end dates for more than a year (start year to current year)

I've been trying to get start and end dates range for each quarter given a specific date/year, like this:
SELECT DATEADD(mm, (quarter - 1) * 3, year_date) StartDate,
DATEADD(dd, 0, DATEADD(mm, quarter * 3, year_date)) EndDate
--quarter QuarterNo
FROM
(
SELECT '2012-01-01' year_date
) s CROSS JOIN
(
SELECT 1 quarter UNION ALL
SELECT 2 UNION ALL
SELECT 3 UNION ALL
SELECT 4
) q
which produces the following output:
2012-01-01 00:00:00 2012-04-01 00:00:00
2012-04-01 00:00:00 2012-07-01 00:00:00
2012-07-01 00:00:00 2012-10-01 00:00:00
2012-10-01 00:00:00 2013-01-01 00:00:00
Problem: I need to do this for a given start_date and end_date, the problem being the end_date=current_day, so how can I achieve this:
2012-01-01 00:00:00 2012-04-01 00:00:00
2012-04-01 00:00:00 2012-07-01 00:00:00
2012-07-01 00:00:00 2012-10-01 00:00:00
2012-10-01 00:00:00 2013-01-01 00:00:00
... ...
2021-01-01 00:00:00 2021-01-06 00:00:00
I think here is what you want to do :
SET startdatevar AS DATEtime = '2020-01-10'
;WITH RECURSIVE cte AS (
SELECT startdatevar AS startdate , DATEADD(QUARTER, 1 , startdatevar) enddate , 1 quarter
UNION ALL
SELECT enddate , CASE WHEN DATEADD(QUARTER, 1 , enddate) > CURRENT_DATE() THEN GETDATE() ELSE DATEADD(QUARTER, 1 , enddate) END enddate, quarter + 1
FROM cte
WHERE
cte.enddate <= CURRENT_DATE()
and quarter < 4
)
SELECT * FROM cte
to use your code , if you want to have more than 4 quarters :
SET quarter_limit = DATEDIFF(quarter , <startdate>,<enddate>)
;WITH RECURSIVE cte(q, qDate,enddate) as
(
select 1,
DATEFROMPARTS(year('2012-01-01'::date), 1, 1) -- First quarter date
,time_slice('2012-01-01'::date, 3, 'MONTH', 'END')
UNION ALL
select q+1,
DATEADD(q, 1, qdate) -- next quarter start date
,time_slice(qdate::date, (q+1)*3, 'MONTH', 'END')
from cte
where q < quarter_limit -- limiting the number of next quarters
AND cte.endDate <= <enddate>
)
SELECT * FROM cte
After #eshirvana's answer, I came up with this slightly change after your answer:
WITH RECURSIVE cte(q, qDate,enddate) as
(
select 1,
DATEFROMPARTS(year('2012-01-01'::date), 1, 1) -- First quarter date
,time_slice('2012-01-01'::date, 3, 'MONTH', 'END')
UNION ALL
select q+1,
DATEADD(q, 1, qdate) -- next quarter start date
,time_slice(qdate::date, (q+1)*3, 'MONTH', 'END')
from cte
where q <4 -- limiting the number of next quarters
AND cte.endDate <= CURRENT_DATE()
)
SELECT * FROM cte
Which works fine for whatever year I pass there (2012 will produce 4 records, 2021 just one, since we're still on the first quarter right now).
[EDIT]: it still doesn't work as expected after your 2nd code sugestion:
WITH RECURSIVE cte(q, qDate,enddate) as
(
select 1,
DATEFROMPARTS(year('2012-01-01'::date), 1, 1) -- First quarter date
,CASE WHEN time_slice('2012-01-01'::date, 3, 'MONTH', 'END') > CURRENT_DATE
THEN current_date
ELSE time_slice('2012-01-01'::date, 3, 'MONTH', 'END')
END
UNION ALL
select q+1,
DATEADD(q, 1, qdate) -- next quarter start date
,time_slice(qdate::date, (q+1)*3, 'MONTH', 'END')
from cte
where q < DATEDIFF(quarter , '2012-01-01'::date,'2021-01-06'::date)
AND cte.endDate <= '2021-01-06'::date
)
SELECT * FROM cte
is outputing this:
Sorry #eshirvana, it doesn't work as expected though. It all goes well to some point, but it's not returning all the records. Instead, it produces less records and wrong one, like this:
1 2012-01-01 2012-04-01
2 2012-04-01 2012-07-01
3 2012-07-01 2012-10-01
4 2012-10-01 2013-01-01
5 2013-01-01 2013-10-01
6 2013-04-01 2013-07-01
7 2013-07-01 2013-10-01
8 2013-10-01 2014-01-01
9 2014-01-01 2015-01-01
10 2014-04-01 2015-01-01
11 2014-07-01 2016-10-01
12 2014-10-01 2015-01-01
13 2015-01-01 2015-07-01
14 2015-04-01 2015-07-01
15 2015-07-01 2018-10-01
16 2015-10-01 2018-01-01
17 2016-01-01 2016-10-01
18 2016-04-01 2019-07-01
19 2016-07-01 2017-07-01
20 2016-10-01 2020-01-01
21 2017-01-01 2017-04-01
22 2017-04-01 2019-07-01
23 2017-07-01 2021-10-01
Although my logic it's still not ok for not printing just Q1 dates for 2021, could this output issues be related to date format or something?
Now, it seems to be working, at least for 2012-01-01 till today (2021-01-06).
The code :
WITH RECURSIVE cte(q, qDate,enddate) as
(
select
-- it might not be the first quarter, so better to protect that:
quarter('2012-01-01'::date)::numeric
, DATEFROMPARTS(year('2012-01-01'::date), 1, 1) -- First quarter date
, CASE WHEN time_slice('2012-01-01'::date, 3, 'MONTH', 'END') > '2021-01-06'::date
THEN '2021-01-06'::date
ELSE time_slice('2012-01-01'::date, 3, 'MONTH', 'END')
END
UNION ALL
select q+1
, DATEADD(q, 1, qdate) -- next quarter start date
,CASE WHEN time_slice(DATEADD(q, 1, qdate), 3, 'MONTH', 'END')> '2021-01-06'::date
THEN '2021-01-06'::date
ELSE time_slice(DATEADD(q, 1, qdate), 3, 'MONTH', 'END')
END
from cte
where q <= DATEDIFF(quarter , '2012-01-01'::date,'2021-01-06'::date)
AND cte.endDate <= '2021-01-06'::date
)
SELECT * FROM cte
The output:
1 2012-01-01 2012-04-01
2 2012-04-01 2012-07-01
3 2012-07-01 2012-10-01
4 2012-10-01 2013-01-01
5 2013-01-01 2013-04-01
6 2013-04-01 2013-07-01
7 2013-07-01 2013-10-01
8 2013-10-01 2014-01-01
9 2014-01-01 2014-04-01
10 2014-04-01 2014-07-01
11 2014-07-01 2014-10-01
12 2014-10-01 2015-01-01
13 2015-01-01 2015-04-01
14 2015-04-01 2015-07-01
15 2015-07-01 2015-10-01
16 2015-10-01 2016-01-01
17 2016-01-01 2016-04-01
18 2016-04-01 2016-07-01
19 2016-07-01 2016-10-01
20 2016-10-01 2017-01-01
21 2017-01-01 2017-04-01
22 2017-04-01 2017-07-01
23 2017-07-01 2017-10-01
24 2017-10-01 2018-01-01
25 2018-01-01 2018-04-01
26 2018-04-01 2018-07-01
27 2018-07-01 2018-10-01
28 2018-10-01 2019-01-01
29 2019-01-01 2019-04-01
30 2019-04-01 2019-07-01
31 2019-07-01 2019-10-01
32 2019-10-01 2020-01-01
33 2020-01-01 2020-04-01
34 2020-04-01 2020-07-01
35 2020-07-01 2020-10-01
36 2020-10-01 2021-01-01
37 2021-01-01 2021-01-06
In case you're wondering: yes, the idea is to present the end_date as last_day of the month+one. But it could easily be adapted.
It's not pretty, but I think it's somehow easy to understand.

Count median days per ID between one zero and the first transaction after the last zero in a running balance

I have a running balance sheet showing customer balances after inflows and (outflows) by date. It looks something like this:
ID DATE AMOUNT RUNNING AMOUNT
-- ---------------- ------- --------------
10 27/06/2019 14:30 100 100
10 29/06/2019 15:26 -100 0
10 03/07/2019 01:56 83 83
10 04/07/2019 17:53 15 98
10 05/07/2019 15:09 -98 0
10 05/07/2019 15:53 98.98 98.98
10 05/07/2019 19:54 -98.98 0
10 07/07/2019 01:36 90.97 90.97
10 07/07/2019 13:02 -90.97 0
10 07/07/2019 16:32 39.88 39.88
10 08/07/2019 13:41 50 89.88
20 08/01/2019 09:03 890.97 890.97
20 09/01/2019 14:47 -91.09 799.88
20 09/01/2019 14:53 100 899.88
20 09/01/2019 14:59 -399 500.88
20 09/01/2019 18:24 311 811.88
20 09/01/2019 23:25 50 861.88
20 10/01/2019 16:18 -861.88 0
20 12/01/2019 16:46 894.49 894.49
20 25/01/2019 05:40 -871.05 23.44
I have attempted using lag() but I seem not to understand how to use it yet.
SELECT ID, MEDIAN(DIFF) MEDIAN_AGE
FROM
(
SELECT *, DATEDIFF(day, Lag(DATE, 1) OVER(ORDER BY ID), DATE
)AS DIFF
FROM TABLE 1
WHERE RUNNING AMOUNT = 0
)
GROUP BY ID;
The expected result would be:
ID MEDIAN_AGE
-- ----------
10 1
20 2
Please help in writing out the query that gives the expected result.
As already pointed out, you are using syntax that isn't valid for Oracle, including functions that don't exist and column names that aren't allowed.
You seem to want to calculate the number of days between a zero running-amount and the following non-zero running-amount; lead() is probably easier than lag() here, and you can use a case expression to only calculate it when needed:
select id, date_, amount, running_amount,
case when running_amount = 0 then
lead(date_) over (partition by id order by date_) - date_
end as diff
from your_table;
ID DATE_ AMOUNT RUNNING_AMOUNT DIFF
---------- -------------------- ---------- -------------- ----------
10 2019-06-27 14:30:00 100 100
10 2019-06-29 15:26:00 -100 0 3.4375
10 2019-07-03 01:56:00 83 83
10 2019-07-04 17:53:00 15 98
10 2019-07-05 15:09:00 -98 0 .0305555556
10 2019-07-05 15:53:00 98.98 98.98
10 2019-07-05 19:54:00 -98.98 0 1.2375
10 2019-07-07 01:36:00 90.97 90.97
10 2019-07-07 13:02:00 -90.97 0 .145833333
10 2019-07-07 16:32:00 39.88 39.88
10 2019-07-08 13:41:00 50 89.88
20 2019-01-08 09:03:00 890.97 890.97
20 2019-01-09 14:47:00 -91.09 799.88
20 2019-01-09 14:53:00 100 899.88
20 2019-01-09 14:59:00 -399 500.88
20 2019-01-09 18:24:00 311 811.88
20 2019-01-09 23:25:00 50 861.88
20 2019-01-10 16:18:00 -861.88 0 2.01944444
20 2019-01-12 16:46:00 894.49 894.49
20 2019-01-25 05:40:00 -871.05 23.44
Then use the median() function, rounding if desired to get your expected result:
select id, median(diff) as median_age, round(median(diff)) as median_age_rounded
from (
select id, date_, amount, running_amount,
case when running_amount = 0 then
lead(date_) over (partition by id order by date_) - date_
end as diff
from your_table
)
group by id;
ID MEDIAN_AGE MEDIAN_AGE_ROUNDED
---------- ---------- ------------------
10 .691666667 1
20 2.01944444 2
db<>fiddle

SQL Collapse Data

I am trying to collapse data that is in a sequence sorted by date. While grouping on the person and the type.
The data is stored in an SQL server and looks like the following -
seq person date type
--- ------ ------------------- ----
1 1 2018-02-10 08:00:00 1
2 1 2018-02-11 08:00:00 1
3 1 2018-02-12 08:00:00 1
4 1 2018-02-14 16:00:00 1
5 1 2018-02-15 16:00:00 1
6 1 2018-02-16 16:00:00 1
7 1 2018-02-20 08:00:00 2
8 1 2018-02-21 08:00:00 2
9 1 2018-02-22 08:00:00 2
10 1 2018-02-23 08:00:00 1
11 1 2018-02-24 08:00:00 1
12 1 2018-02-25 08:00:00 2
13 2 2018-02-10 08:00:00 1
14 2 2018-02-11 08:00:00 1
15 2 2018-02-12 08:00:00 1
16 2 2018-02-14 16:00:00 3
17 2 2018-02-15 16:00:00 3
18 2 2018-02-16 16:00:00 3
This data set contains about 1.2 million records that resemble the above.
The result that I would like to get from this would be -
person start type
------ ------------------- ----
1 2018-02-10 08:00:00 1
1 2018-02-20 08:00:00 2
1 2018-02-23 08:00:00 1
1 2018-02-25 08:00:00 2
2 2018-02-10 08:00:00 1
2 2018-02-14 16:00:00 3
I have the data in the first format by running the following query -
select
ROW_NUMBER() OVER (ORDER BY date) AS seq
person,
date,
type,
from table
group by person, date, type
I am just not sure how to keep the minimum date with the other distinct values from person and type.
This is a gaps-and-islands problem so, you can use differences of row_number() & use them in grouping :
select person, min(date) as start, type
from (select *,
row_number() over (partition by person order by seq) seq1,
row_number() over (partition by person, type order by seq) seq2
from table
) t
group by person, type, (seq1 - seq2)
order by person, start;
The correct solution using the difference of row numbers is:
select person, type, min(date) as start
from (select t.*,
row_number() over (partition by person order by seq) as seqnum_p,
row_number() over (partition by person, type order by seq) as seqnum_pt
from t
) t
group by person, type, (seqnum_p - seqnum_pt)
order by person, start;
type needs to be included in the GROUP BY.

Get temperature from live data if available, else avg over historical data

I am trying to get either live temperature for a trip, if live data is not available get an average temperature from histroical data.
I have made a simple version of my problem, with these tabels:
Trip
id departure_time arrival_time location_id
1 2018-04-07 07:00:00 2018-04-14 17:00:00 1
2 2018-04-14 07:00:00 2018-04-21 17:00:00 1
Location
id name
1 Location
Weather
id temperature date location_id
1 20 2018-04-07 1
2 20 2018-04-08 1
3 20 2018-04-09 1
4 20 2018-04-10 1
5 20 2018-04-11 1
6 20 2018-04-12 1
7 20 2018-04-13 1
8 20 2018-04-14 1
9 15 2016-04-07 1
10 15 2016-04-08 1
11 15 2016-04-09 1
12 15 2016-04-10 1
13 15 2016-04-11 1
14 15 2016-04-12 1
15 15 2016-04-13 1
16 15 2016-04-14 1
17 19 2017-04-07 1
18 19 2017-04-08 1
19 19 2017-04-09 1
20 19 2017-04-10 1
21 19 2017-04-11 1
22 19 2017-04-12 1
23 19 2017-04-13 1
24 19 2017-04-14 1
25 15 2017-04-15 1
26 15 2017-04-16 1
27 15 2017-04-17 1
28 15 2017-04-18 1
29 15 2017-04-19 1
30 15 2017-04-20 1
31 15 2017-04-21 1
32 19 2016-04-15 1
33 19 2016-04-16 1
34 19 2016-04-17 1
35 19 2016-04-18 1
36 19 2016-04-19 1
37 19 2016-04-20 1
38 19 2016-04-21 1
The problem i am having is that since these trips are last-minute trips i have "live" data for trips departing within the next week.
So i would like to get a either live forecast if available, else an avg for the temperature from the years from the previous years.
http://sqlfiddle.com/#!17/bce59/3
Here is the approach i took in order to try and solve the problem.
If any details has been forgotten please ask.
Expected result:
id departure_time arrival_time location_id temperature
1 2018-04-07 07:00:00 2018-04-14 17:00:00 1 20
1 2018-04-07 07:00:00 2018-04-14 17:00:00 1 20
1 2018-04-07 07:00:00 2018-04-14 17:00:00 1 20
1 2018-04-07 07:00:00 2018-04-14 17:00:00 1 20
1 2018-04-07 07:00:00 2018-04-14 17:00:00 1 20
1 2018-04-07 07:00:00 2018-04-14 17:00:00 1 20
1 2018-04-07 07:00:00 2018-04-14 17:00:00 1 20
1 2018-04-07 07:00:00 2018-04-14 17:00:00 1 20
2 2018-04-14 07:00:00 2018-04-21 17:00:00 1 20
2 2018-04-14 07:00:00 2018-04-21 17:00:00 1 17
2 2018-04-14 07:00:00 2018-04-21 17:00:00 1 17
2 2018-04-14 07:00:00 2018-04-21 17:00:00 1 17
2 2018-04-14 07:00:00 2018-04-21 17:00:00 1 17
2 2018-04-14 07:00:00 2018-04-21 17:00:00 1 17
2 2018-04-14 07:00:00 2018-04-21 17:00:00 1 17
2 2018-04-14 07:00:00 2018-04-21 17:00:00 1 17
Using generate_series function to make a Calendar from trip table on subquery.
Then Left JOIN on subquery by dates you might get match weather you can get it temperature. if temperature is null on w.temperature then get avg temperature
You can try this.
SELECT t.id,
t.departure_time,
t.arrival_time,
l.id as "location_id",
coalesce(w.temperature,(select FLOOR(avg(temperature)) from weather)) as "temperature"
FROM
location l inner join
(
select id,
location_id,
departure_time,
arrival_time,
generate_series(departure_time :: timestamp,arrival_time::timestamp,'1 day'::interval) as dates
from trip
) t on t.location_id = l.id LEFT JOIN weather w on t.dates::date = w.date::date
sqlfiddle:http://sqlfiddle.com/#!17/bce59/48
EDIT
You could use a CTE query get Avg by year instead of the subquery in coalesce function on select clause.
WITH weather_avg AS (
SELECT floor(avg(a)) avgTemp
from
(
SELECT
extract(YEAR from weather.date) AS YEAR,
floor(avg(weather.temperature)) a
FROM weather
group by extract(YEAR from weather.date)
) t
)
SELECT t.id,
t.departure_time,
t.arrival_time,
t.location_id as "location_id",
coalesce(w.temperature,(select avgTemp from weather_avg)) as "temperature"
FROM
(
select t.id,
t.location_id,
t.departure_time,
t.arrival_time,
generate_series(departure_time :: timestamp,arrival_time::timestamp,'1 day'::interval) as dates
from trip t inner join location l on t.location_id = l.id
) t LEFT JOIN weather w
on t.dates::date = w.date::date
sqlfiddle:http://sqlfiddle.com/#!17/bce59/76

sql query for selecting 30 days data with time interval

SQL Query not giving expected answer
SELECT CAST(PR.DateTimeStamp as date) AS PRDate,COUNT(PR.ID) AS PRCount
FROM tbl_Purchase PR
INNER JOIN tbl_PurchaseCategory PTC ON PR.ID = PTC.ID
WHERE PR.DateTimeStamp BETWEEN DATEADD(DAY,-30,'2017-12-07 09:00:00') AND
'2017-12-07 09:00:00' and PR.DepartmentID=1 and PTC.CategoryID=1 group by
CAST(PR.DateTimeStamp as date) order by CAST(PR.DateTimeStamp as date)
i want to select data like
PRDate PRCount
2017-12-07 3 // from 2017-12-08 09:00:00 to 2017-12-07 09:00:00
2017-12-06 31 // from 2017-12-07 09:00:00 to 2017-12-06 09:00:00
2017-12-05 10 // from 2017-12-06 09:00:00 to 2017-12-05 09:00:00
2017-12-04 23
2017-12-03 27
2017-12-02 15
2017-12-01 27
2017-11-30 39
2017-11-29 25
2017-11-28 27
2017-11-27 36
2017-11-26 30
2017-11-25 23
2017-11-24 18
2017-11-23 13
2017-11-22 16
2017-11-21 25
2017-11-20 15
2017-11-19 41
2017-11-18 11
2017-11-17 9
2017-11-16 19
2017-11-15 23
2017-11-14 17
2017-11-13 23
2017-11-12 20
2017-11-11 31
2017-11-10 29
2017-11-09 18
2017-11-08 29
2017-11-07 24
the above query is proving me data
12 to 12 time interval not from 9 to 9
You should subtract 9 hours from the date for the group by.
SELECT
CAST( DATEADD(HOUR,-9, PR.DateTimeStamp) as date) AS PRDate
, COUNT(PR.ID) AS PRCount
FROM tbl_Purchase PR
INNER JOIN tbl_PurchaseCategory PTC ON PR.ID = PTC.ID
WHERE
PR.DateTimeStamp BETWEEN DATEADD(DAY,-30,'2017-12-07 09:00:00') AND '2017-12-07 09:00:00'
AND PR.DepartmentID=1 and PTC.CategoryID=1
group by
CAST(DATEADD(HOUR,-9, PR.DateTimeStamp) as date)
order by
CAST(DATEADD(HOUR,-9, PR.DateTimeStamp) as date)