PL/SQL Finding Difference Between Start and End Dates in Different Rows - sql

I am trying to find the difference between start and end dates in different rows of a result set, using PL/SQL. Here is an example:
ID TERM START_DATE END_DATE
423 201420 26-AUG-13 13-DEC-13
423 201430 21-JAN-14 09-MAY-14
423 201440 16-JUN-14 07-AUG-14
For any specific ID, I need to get the difference between the end date in the first record and the start date of the second record. Similarly, I need to get the difference between the end date in the second record and the start date of the third record, and so forth.
Eventually I will need to perform the same operation on a variety of IDs. I am assuming I have to use a cursor and loop.
I would appreciate any help or suggestions on accomplishing this. Thanks in advance.

The "lead" analytic function in Oracle can grab a value from the succeeding row as a value in the current row.
Given a series of rows returned from a query and a position of the cursor, LEAD provides access to a row at a given physical offset beyond that position.
Here, this SQL grabs start_date from the next row and subtracts end_date from the current row.
select id, term, start_date, end_date,
lead(start_date) over (partition by id order by term) - end_date diff_in_days
from your_table;
Sample output:
ID TERM START_DATE END_DATE DIFF_IN_DAYS
---------- ---------- -------------------- -------------------- ------------
423 201420 26-AUG-2013 00:00:00 13-DEC-2013 00:00:00 39
423 201430 21-JAN-2014 00:00:00 09-MAY-2014 00:00:00 36
423 201440 14-JUN-2014 00:00:00 07-AUG-2014 00:00:00

I would suggest looking at using the LEAD and LAG analytic functions from Oracle. By the sounds of it they should suit your needs.
See the docs here: http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions074.htm

Code:
SELECT [ID], [TERM], [START_DATE], [END_DATE],
CASE WHEN MIN([END_DATE]) OVER(PARTITION BY [ID] ORDER BY [TERM] ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)=[END_DATE] THEN NULL ELSE
MIN([END_DATE]) OVER(PARTITION BY [ID] ORDER BY [TERM] ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)-[START_DATE] END AS [DAYS_BETWEEN]
FROM [TABLE]

This seemed to work:
SELECT DISTINCT
ID,
TERM_CODE,
TERM_START_DATE,
TERM_END_DATE,
( ( LEAD ( TERM_START_DATE, 1 ) OVER ( PARTITION BY ID ORDER BY TERM_CODE ) ) -TERM_END_DATE AS DIFF DAYS
FROM TABLE

Related

SQL to find sum of total days in a window for a series of changes

Following is the table:
start_date
recorded_date
id
2021-11-10
2021-11-01
1a
2021-11-08
2021-11-02
1a
2021-11-11
2021-11-03
1a
2021-11-10
2021-11-04
1a
2021-11-10
2021-11-05
1a
I need a query to find the total day changes in aggregate for a given id. In this case, it changed from 10th Nov to 8th Nov so 2 days, then again from 8th to 11th Nov so 3 days and again from 11th to 10th for a day, and finally from 10th to 10th, that is 0 days.
In total there is a change of 2+3+1+0 = 6 days for the id - '1a'.
Basically for each change there is a recorded_date, so we arrange that in ascending order and then calculate the aggregate change of days grouped by id. The final result should be like:
id
Agg_Change
1a
6
Is there a way to do this using SQL. I am using vertica database.
Thanks.
you can use window function lead to get the difference between rows and then group by id
select id, sum(daydiff) Agg_Change
from (
select id, abs(datediff(day, start_Date, lead(start_date,1,start_date) over (partition by id order by recorded_date))) as daydiff
from tablename
) t group by id
It's indeed the use of LAG() to get the previous date in an OLAP query, and an outer query getting the absolute date difference, and the sum of it, grouping by id:
WITH
-- your input - don't use in real query ...
indata(start_date,recorded_date,id) AS (
SELECT DATE '2021-11-10',DATE '2021-11-01','1a'
UNION ALL SELECT DATE '2021-11-08',DATE '2021-11-02','1a'
UNION ALL SELECT DATE '2021-11-11',DATE '2021-11-03','1a'
UNION ALL SELECT DATE '2021-11-10',DATE '2021-11-04','1a'
UNION ALL SELECT DATE '2021-11-10',DATE '2021-11-05','1a'
)
-- real query starts here, replace following comma with "WITH" ...
,
w_lag AS (
SELECT
id
, start_date
, LAG(start_date) OVER w AS prevdt
FROM indata
WINDOW w AS (PARTITION BY id ORDER BY recorded_date)
)
SELECT
id
, SUM(ABS(DATEDIFF(DAY,start_date,prevdt))) AS dtdiff
FROM w_lag
GROUP BY id
-- out id | dtdiff
-- out ----+--------
-- out 1a | 6
I was thinking lag function will provide me the answer, but it kept giving me wrong answer because I had the wrong logic in one place. I have the answer I need:
with cte as(
select id, start_date, recorded_date,
row_number() over(partition by id order by recorded_date asc) as idrank,
lag(start_date,1) over(partition by id order by recorded_date asc) as prev
from table_temp
)
select id, sum(abs(date(start_date) - date(prev))) as Agg_Change
from cte
group by 1
If someone has a better solution please let me know.

Teradata get row counts for previous two days and compare

I'm trying to setup a data check, where we get the row count from a table for today and prior date. Since it isn't loaded on weekends or holidays, I can't say DATE-1.
I came-up with the following, to get the previous date:
SELECT
LOAD_DATE
,COUNT(LOAD_DATE) RW_COUNT
,ROW_NUMBER() OVER (ORDER BY LOAD_DATE ) AS LOAD_ROWNUM
FROM DATABASE1.TABLE1
WHERE LOAD_DATE >= DATE-6
GROUP BY 1
This produces the dates, counts and assigns a row number.
LOAD_DATE RW_COUNT LOAD_ROWNUM
2019-10-16 8259 1
2019-10-15 8253 2
2019-10-11 8256 3
2019-10-10 8243 4
I to take the two most current dates and compare them. Most current would be "current" and the 2nd most current would be "prior" . Then I would like to have something like this as the result set:
CURRENT_COUNT PRIOR_COUNT DIFF_PERCENT
8259 8253 .9927
My issue is, how do I reference the first two rows and compare them to each other? Unless I'm over-thinking this, I need two additional SELECT statements: 1 with the WHERE clause referencing row 1 and another with a WHERE referercing row 2.
How do I do that? Do I have two CTEs?
Eventually, I'll need a third SELECT dividing the two rows and checking for 10% tolerance. Help, I'm in analysis paralysis.
You can filter the result of an OLAP-function using QUALIFY:
SELECT
LOAD_DATE
,COUNT(LOAD_DATE) AS CURRENT_COUNT
-- previous day's count
,LEAD(RW_COUNT)
OVER (ORDER BY LOAD_DATE DESC) AS PRIOR_COUNT
-- if your TD version doesn't support LAG/LEAD (i.e. < 16.10)
--,MIN(RW_COUNT)
-- OVER (ORDER BY LOAD_DATE DESC
-- ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING) AS PRIOR_COUNT
,CAST(CURRENT_COUNT AS DECIMAL(18,4)) / PRIOR_COUNT AS DIFF_PERCENT
FROM DATABASE1.TABLE1
WHERE LOAD_DATE >= DATE-6
GROUP BY 1
-- return the latest row only
QUALIFY ROW_NUMBER() OVER (ORDER BY LOAD_DATE DESC) = 1
checking for 10% tolerance:
DIFF_PERCENT BETWEEN 0.9 and 1.1
Either ANDed to the QUALIFY or within a CASE
I don't know what you want for your result set. But you can use LAG() with aggregation to get the previous value.
SELECT LOAD_DATE, COUNT(*) as RW_COUNT,
LAG(COUNT(*)) OVER (ORDER BY LOAD_DATE) as PREV_RW_COUNT
FROM DATABASE1.TABLE1
WHERE LOAD_DATE >= DATE-6
GROUP BY 1;
You may just want a difference of the two counts.
If your TD version (16.0+?) doesn't support LEAD/LAG, give this a try:
SELECT
load_date,
RW_COUNT,
MAX(RW_COUNT) OVER(
ORDER BY load_date DESC
ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING -- Get previous row's value
) AS RW_COUNT_prev
FROM (
SELECT load_date, COUNT(LOAD_DATE) RW_COUNT,
FROM DATABASE1.TABLE1
WHERE LOAD_DATE >= DATE-6
GROUP BY 1
) src

Getting maximum sequential streak with events - updated question

I've previously posted a similar question to this, but an update on the parameters has meant that the solution posted wouldn't work, and I've had trouble trying to work out how to integrate the revised requirement. I'm not sure the protocol in here- it appears that I can't post an updated question to the original post at Getting maximum sequential streak with events
I’m looking for a single query, if possible, running PostgreSQL 9.6.6 under pgAdmin3 v1.22.1
I have a table with a date and a row for each event on the date:
Date Events
2018-12-10 1
2018-12-10 1
2018-12-10 0
2018-12-09 1
2018-12-08 0
2018-12-08 0
2018-12-07 1
2018-12-06 1
2018-12-06 1
2018-12-06 0
2018-12-06 1
2018-12-04 1
2018-12-03 0
I’m looking for the longest sequence of dates without a break. In this case, 2018-12-08 and 2018-12-03 are the only dates with no events, there are two dates with events between 2018-12-08 and today, and three between 2018-12-8 and 2018-12-07 - so I would like the answer of 3.
I know I can group them together with something like:
Select Date, count(Date) from Table group by Date order by Date Desc
To get just the most recent sequence, I’ve got something like this- the subquery returns the most recent date with no events, and the outer query counts the dates after that date:
select date, count(distinct date) from Table
where date>
( select date from Table
group by date
having count (case when Events is not null then 1 else null end) = 0
order by date desc
fetch first row only)
group by date
But now I need the longest streak, not just the most recent streak.
I had assumed when I posted previously that there were rows for every date in the range. But this assumption wasn't correct, so the answer given doesn't work. I also need the query to return the start and end date for the range.
Thank you!
You can assign group by doing a cumulative count of the 0s. Then count the distinct dates in each group:
select count(*), min(date), max(date), count(distinct date)
from (select t.*,
count(*) filter (where events = 0) over (order by date) as grp
from t
) t
group by grp
order by count(distinct date) desc
limit 1;

SQL to display keys of leading, lagging record

I have data in table that can presented by SQL as below :
SELECT T.VERSION_ID T_VERSION_ID
,cast(T.START_DATE As Date) as T_START_DATE
,cast(ISNULL( LEAD (START_DATE) OVER (ORDER BY START_DATE),'9999-12-31') As Date) as CALC_END_DATE_LEAD
,cast(ISNULL( LAG (START_DATE) OVER (ORDER BY START_DATE),'9999-12-31') As Date) as CALC_END_DATE_LAG
FROM(select 'Vrandom1' as VERSION_ID
,cast('22-MAR-2018' As Date) as start_date
,'9999-12-31' as end_date
, 1 as is_approved
union
select 'Vrandom2' as VERSION_ID
,cast('28-MAR-2018' As Date) as start_date
,'9999-12-31' as end_date
,1 as is_approved
union
select 'Vrandom3' as VERSION_ID
,cast('25-MAR-2018' As date) as start_date
,'9999-12-31' as end_date
,1 as is_approved
) as T
Output
T_VERSION_ID T_START_DATE CALC_END_DATE_LEAD CALC_END_DATE_LAG
Vrandom1 22/03/2018 25/03/2018 31/12/9999
Vrandom3 25/03/2018 28/03/2018 22/03/2018
Vrandom2 28/03/2018 31/12/9999 25/03/2018
This table is used inside application where one record say with version "Vrandom3" will be in effect. For processing, I need to find keys of immediate leading and lagging record as per start date. i.e. I would need to display Vrandom2 and Vrandom1 as the keys of leading and lagging record.
Desired result in the application:
T_VERSION_ID T_START_DATE CALC_END_DATE_LEAD CALC_END_DATE_LAG key_leading key_lagging
Vrandom3 25/03/2018 28/03/2018 22/03/2018 Vrandom2 Vrandom1
or
T_VERSION_ID T_START_DATE CALC_END_DATE_LEAD CALC_END_DATE_LAG key_leading key_lagging
Vrandom1 22/03/2018 25/03/2018 31/12/9999 Vrandom3 null
I can think of joining inline views based on start_date but is there any better way to achieve this?
LAG (there's also LEAD) windowing function
Accesses data from a previous row in the same result set without the
use of a self-join starting with SQL Server 2012. LAG provides access
to a row at a given physical offset that comes before the current row.
Use this analytic function in a SELECT statement to compare values in
the current row with values in a previous row.
These functions are designed to get leading and lagging rows.
Example from the link:
USE AdventureWorks2012;
GO
SELECT BusinessEntityID, YEAR(QuotaDate) AS SalesYear, SalesQuota AS CurrentQuota,
LAG(SalesQuota, 1,0) OVER (ORDER BY YEAR(QuotaDate)) AS PreviousQuota
FROM Sales.SalesPersonQuotaHistory
WHERE BusinessEntityID = 275 and YEAR(QuotaDate) IN ('2005','2006');
How about adding:
,LEAD (key_col) OVER (ORDER BY START_DATE),'9999-12-31') As Date) as Key_col_LEAD
,LAG (key_col) OVER (ORDER BY START_DATE),'9999-12-31') As Date) as Key_col_LAG
to your SELECT

select query to select closest date which is less then or equal to current date in postgresql

this is my table , name is resource_calendar.
i want to select resource_id which have effective date less then or equal to current date and most closest date to current date.
what will be the right query in postgresql?
query will
select effective date 22 for resource_id=3 and effective date 21 for resource_id=7
so result should be
id resource_id calendar_id applied_on effective_date version
19 3 6 2016-12-22 11:13:26.53 2016-12-22 0
26 7 5 2016-12-22 11:16:26.53 2016-12-21 0
SELECT t.*
FROM
(
SELECT id, resource_id, calendar_id, applied_on, effective_date, version,
MIN(ABS(EXTRACT(EPOCH FROM (current_timestamp - effective_date))))
OVER (PARTITION BY resource_id) AS diff
FROM resource_calendar
WHERE EXTRACT(EPOCH FROM (current_timestamp - effective_date)) > 0
) t
WHERE ABS(EXTRACT(EPOCH FROM (current_timestamp - t.effective_date))) = t.diff
This query forms a partition by resource_id on the resource_calendar. You can think of this partition as a logically grouping records together which have the same resource_id. For each such group of records, it computes the smallest difference between the effective_date and the current timestamp, where the effective_date be earlier than the current timestamp.
The outer query then identifies those records having this minimum timestamp difference.
Postgres has some reasonably helpful documentation on using window functions if you feel you need more information.
You can use this. A simple query
SELECT DISTINCT ON(resource_id) *
FROM planner.resource_calendar
WHERE effective_date <= CURRENT_DATE
ORDER BY resource_id, effective_date desc;