SQL to display keys of leading, lagging record - sql

I have data in table that can presented by SQL as below :
SELECT T.VERSION_ID T_VERSION_ID
,cast(T.START_DATE As Date) as T_START_DATE
,cast(ISNULL( LEAD (START_DATE) OVER (ORDER BY START_DATE),'9999-12-31') As Date) as CALC_END_DATE_LEAD
,cast(ISNULL( LAG (START_DATE) OVER (ORDER BY START_DATE),'9999-12-31') As Date) as CALC_END_DATE_LAG
FROM(select 'Vrandom1' as VERSION_ID
,cast('22-MAR-2018' As Date) as start_date
,'9999-12-31' as end_date
, 1 as is_approved
union
select 'Vrandom2' as VERSION_ID
,cast('28-MAR-2018' As Date) as start_date
,'9999-12-31' as end_date
,1 as is_approved
union
select 'Vrandom3' as VERSION_ID
,cast('25-MAR-2018' As date) as start_date
,'9999-12-31' as end_date
,1 as is_approved
) as T
Output
T_VERSION_ID T_START_DATE CALC_END_DATE_LEAD CALC_END_DATE_LAG
Vrandom1 22/03/2018 25/03/2018 31/12/9999
Vrandom3 25/03/2018 28/03/2018 22/03/2018
Vrandom2 28/03/2018 31/12/9999 25/03/2018
This table is used inside application where one record say with version "Vrandom3" will be in effect. For processing, I need to find keys of immediate leading and lagging record as per start date. i.e. I would need to display Vrandom2 and Vrandom1 as the keys of leading and lagging record.
Desired result in the application:
T_VERSION_ID T_START_DATE CALC_END_DATE_LEAD CALC_END_DATE_LAG key_leading key_lagging
Vrandom3 25/03/2018 28/03/2018 22/03/2018 Vrandom2 Vrandom1
or
T_VERSION_ID T_START_DATE CALC_END_DATE_LEAD CALC_END_DATE_LAG key_leading key_lagging
Vrandom1 22/03/2018 25/03/2018 31/12/9999 Vrandom3 null
I can think of joining inline views based on start_date but is there any better way to achieve this?

LAG (there's also LEAD) windowing function
Accesses data from a previous row in the same result set without the
use of a self-join starting with SQL Server 2012. LAG provides access
to a row at a given physical offset that comes before the current row.
Use this analytic function in a SELECT statement to compare values in
the current row with values in a previous row.
These functions are designed to get leading and lagging rows.
Example from the link:
USE AdventureWorks2012;
GO
SELECT BusinessEntityID, YEAR(QuotaDate) AS SalesYear, SalesQuota AS CurrentQuota,
LAG(SalesQuota, 1,0) OVER (ORDER BY YEAR(QuotaDate)) AS PreviousQuota
FROM Sales.SalesPersonQuotaHistory
WHERE BusinessEntityID = 275 and YEAR(QuotaDate) IN ('2005','2006');

How about adding:
,LEAD (key_col) OVER (ORDER BY START_DATE),'9999-12-31') As Date) as Key_col_LEAD
,LAG (key_col) OVER (ORDER BY START_DATE),'9999-12-31') As Date) as Key_col_LAG
to your SELECT

Related

Teradata get row counts for previous two days and compare

I'm trying to setup a data check, where we get the row count from a table for today and prior date. Since it isn't loaded on weekends or holidays, I can't say DATE-1.
I came-up with the following, to get the previous date:
SELECT
LOAD_DATE
,COUNT(LOAD_DATE) RW_COUNT
,ROW_NUMBER() OVER (ORDER BY LOAD_DATE ) AS LOAD_ROWNUM
FROM DATABASE1.TABLE1
WHERE LOAD_DATE >= DATE-6
GROUP BY 1
This produces the dates, counts and assigns a row number.
LOAD_DATE RW_COUNT LOAD_ROWNUM
2019-10-16 8259 1
2019-10-15 8253 2
2019-10-11 8256 3
2019-10-10 8243 4
I to take the two most current dates and compare them. Most current would be "current" and the 2nd most current would be "prior" . Then I would like to have something like this as the result set:
CURRENT_COUNT PRIOR_COUNT DIFF_PERCENT
8259 8253 .9927
My issue is, how do I reference the first two rows and compare them to each other? Unless I'm over-thinking this, I need two additional SELECT statements: 1 with the WHERE clause referencing row 1 and another with a WHERE referercing row 2.
How do I do that? Do I have two CTEs?
Eventually, I'll need a third SELECT dividing the two rows and checking for 10% tolerance. Help, I'm in analysis paralysis.
You can filter the result of an OLAP-function using QUALIFY:
SELECT
LOAD_DATE
,COUNT(LOAD_DATE) AS CURRENT_COUNT
-- previous day's count
,LEAD(RW_COUNT)
OVER (ORDER BY LOAD_DATE DESC) AS PRIOR_COUNT
-- if your TD version doesn't support LAG/LEAD (i.e. < 16.10)
--,MIN(RW_COUNT)
-- OVER (ORDER BY LOAD_DATE DESC
-- ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING) AS PRIOR_COUNT
,CAST(CURRENT_COUNT AS DECIMAL(18,4)) / PRIOR_COUNT AS DIFF_PERCENT
FROM DATABASE1.TABLE1
WHERE LOAD_DATE >= DATE-6
GROUP BY 1
-- return the latest row only
QUALIFY ROW_NUMBER() OVER (ORDER BY LOAD_DATE DESC) = 1
checking for 10% tolerance:
DIFF_PERCENT BETWEEN 0.9 and 1.1
Either ANDed to the QUALIFY or within a CASE
I don't know what you want for your result set. But you can use LAG() with aggregation to get the previous value.
SELECT LOAD_DATE, COUNT(*) as RW_COUNT,
LAG(COUNT(*)) OVER (ORDER BY LOAD_DATE) as PREV_RW_COUNT
FROM DATABASE1.TABLE1
WHERE LOAD_DATE >= DATE-6
GROUP BY 1;
You may just want a difference of the two counts.
If your TD version (16.0+?) doesn't support LEAD/LAG, give this a try:
SELECT
load_date,
RW_COUNT,
MAX(RW_COUNT) OVER(
ORDER BY load_date DESC
ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING -- Get previous row's value
) AS RW_COUNT_prev
FROM (
SELECT load_date, COUNT(LOAD_DATE) RW_COUNT,
FROM DATABASE1.TABLE1
WHERE LOAD_DATE >= DATE-6
GROUP BY 1
) src

Find the date after a gap in date range in sql

I have these date ranges that represent start and end dates of subscription. There are no overlaps in date ranges.
Start Date End Date
1/5/2015 - 1/14/2015
1/15/2015 - 1/20/2015
1/24/2015 - 1/28/2015
1/29/2015 - 2/3/2015
I want to identify delays of more than 1 day between any subscription ending and a new one starting. e.g. for the data above, i want the output: 1/24/2015 - 1/28/2015.
How can I do this using a sql query?
Edit : Also there can be multiple gaps in the subscription date ranges but I want the date range after the latest one.
You do this using a left join or not exists:
select t.*
from t
where not exists (select 1
from t t2
where t2.enddate = dateadd(day, -1, t.startdate)
);
Note that this will also give you the first record in the sequence . . . which, strictly speaking, matches the conditions. Here is one solution to that problem:
select t.*
from t cross join
(select min(startdate) as minsd from t) as x
where not exists (select 1
from t t2
where t2.enddate = dateadd(day, -1, t.startdate)
) and
t.startdate <> minsd;
You can also approach this with window functions:
select t.*
from (select t.*,
lag(enddate) over (order by startdate) as prev_enddate,
min(startdate) over () as min_startdate
from t
) t
where minstartdate <> startdate and
enddate <> dateadd(day, -1, startdate);
Also note that this logic assumes that the time periods do not overlap. If they do, a clearer problem statement is needed to understand what you are really looking for.
You can achieve this using window function LAG() that would get value from previous row in ordered set for later comparison in WHERE clause. Then, in WHERE you just apply your "gapping definition" and discard the first row.
SQL FIDDLE - Test it!
Sample data:
create table dates(start_date date, end_date date);
insert into dates values
('2015-01-05','2015-01-14'),
('2015-01-15','2015-01-20'),
('2015-01-24','2015-01-28'), -- gap
('2015-01-29','2015-02-03'),
('2015-02-04','2015-02-07'),
('2015-02-09','2015-02-11'); -- gap
Query
SELECT
start_date,
end_date
FROM (
SELECT
start_date,
end_date,
LAG(end_date, 1) OVER (ORDER BY start_date) AS prev_end_date
FROM dates
) foo
WHERE
start_date IS DISTINCT FROM ( prev_end_date + 1 ) -- compare current row start_date with previous row end_date + 1 day
AND prev_end_date IS NOT NULL -- discard first row, which has null value in LAG() calculation
I assume that there are no overlaps in your data and that there are unique values for each pair. If that's not the case, you need to clarify this.

PL/SQL Finding Difference Between Start and End Dates in Different Rows

I am trying to find the difference between start and end dates in different rows of a result set, using PL/SQL. Here is an example:
ID TERM START_DATE END_DATE
423 201420 26-AUG-13 13-DEC-13
423 201430 21-JAN-14 09-MAY-14
423 201440 16-JUN-14 07-AUG-14
For any specific ID, I need to get the difference between the end date in the first record and the start date of the second record. Similarly, I need to get the difference between the end date in the second record and the start date of the third record, and so forth.
Eventually I will need to perform the same operation on a variety of IDs. I am assuming I have to use a cursor and loop.
I would appreciate any help or suggestions on accomplishing this. Thanks in advance.
The "lead" analytic function in Oracle can grab a value from the succeeding row as a value in the current row.
Given a series of rows returned from a query and a position of the cursor, LEAD provides access to a row at a given physical offset beyond that position.
Here, this SQL grabs start_date from the next row and subtracts end_date from the current row.
select id, term, start_date, end_date,
lead(start_date) over (partition by id order by term) - end_date diff_in_days
from your_table;
Sample output:
ID TERM START_DATE END_DATE DIFF_IN_DAYS
---------- ---------- -------------------- -------------------- ------------
423 201420 26-AUG-2013 00:00:00 13-DEC-2013 00:00:00 39
423 201430 21-JAN-2014 00:00:00 09-MAY-2014 00:00:00 36
423 201440 14-JUN-2014 00:00:00 07-AUG-2014 00:00:00
I would suggest looking at using the LEAD and LAG analytic functions from Oracle. By the sounds of it they should suit your needs.
See the docs here: http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions074.htm
Code:
SELECT [ID], [TERM], [START_DATE], [END_DATE],
CASE WHEN MIN([END_DATE]) OVER(PARTITION BY [ID] ORDER BY [TERM] ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)=[END_DATE] THEN NULL ELSE
MIN([END_DATE]) OVER(PARTITION BY [ID] ORDER BY [TERM] ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)-[START_DATE] END AS [DAYS_BETWEEN]
FROM [TABLE]
This seemed to work:
SELECT DISTINCT
ID,
TERM_CODE,
TERM_START_DATE,
TERM_END_DATE,
( ( LEAD ( TERM_START_DATE, 1 ) OVER ( PARTITION BY ID ORDER BY TERM_CODE ) ) -TERM_END_DATE AS DIFF DAYS
FROM TABLE

Select only of MIN values in ORACLE SQL

I want to select the earliest date and time from my data set and only show those row(s) that fit the requirement. And show 3 columns.
I got it to show the data in the right order by date and time. How can I just get it to show the data that have the mininum values? I tried using first, limit, and top x, but they don't work, and aren't exactly what I need since the answer may have more than 1 value.
Here is my example sql:
Select report, date, time
From events
order by date, time
Try this:
SELECT report, date, time
FROM (SELECT report, date, time,
ROW_NUMBER() OVER(PARTITION BY report ORDER BY date ASC, time ASC) AS RowNum
From events
) AS CTE
WHERE CTE.RowNum = 1
Something like this should work assuming that every row has a validly formatted day and time component.
SELECT report,
dt,
time
FROM (SELECT report,
dt,
time,
rank() over (partition by report
order by to_date( dt || ' ' || time, 'MM/DD/YYYY HH24MI' ) asc) rnk
FROM events)
WHERE rnk = 1
From a data model standpoint, however, you should always store dates in DATE columns rather than trying to store them in VARCHAR2 columns. Since you want date comparison and sorting semantics, you'll have to transform the data into a DATE which is costly at runtime. And there is a great chance that someone will eventually store data in a different format in the column or store an invalid string (i.e. a day of '02/29/2011') which will cause your query to start generating errors.
Guessing a bit as the data types aren't clear, but something like this might work (example using a CTE to generate dummy data):
with events as (
select 'report1' as report, '01/01/2012' as date_field, '0800' as time_field
from dual
union all select 'report1', '01/01/2012', '0900' from dual
union all select 'report1', '01/02/2012', '0930' from dual
union all select 'report2', '01/01/2012', '0900' from dual
union all select 'report2', '01/01/2012', '0900' from dual
union all select 'report2', '01/01/2012', '1000' from dual
)
select report, date_field, time_field
from (
select report, date_field, time_field,
row_number() over (partition by report
order by to_date(date_field, 'MM/DD/YYYY'), time_field) as rn
from events
)
where rn = 1
order by report;
REPORT DATE_FIELD TIME
------- ---------- ----
report1 01/01/2012 0800
report2 01/01/2012 0900
You may have a different date format mask; I've assumed US format as you referred to 'military time'.
Depending on how you want to treat ties, you'll want rank or dense_rank instead of row_number. See the documentation of analytic functions for more info. As Justin pointed out you probably want rank, which with the same data gives:
REPORT DATE_FIELD TIME
------- ---------- ----
report1 01/01/2012 0800
report2 01/01/2012 0900
report2 01/01/2012 0900
The inner select adds an extra rn column that assigns a ranking to each result; each value of report will have at least one row that gets assigned 1 (if using rank, otherwise exactly one), and possibly rows with 2, 3 etc. The one(s) with 1 will have the earliest date/time for that report. The outer query then filters to only show those ranked 1, via the where rn = 1 clause, hence only giving the data with the earliest date/time for each report - the rest is discarded.

Last day of the month with a twist in SQLPLUS

I would appreciate a little expert help please.
in an SQL SELECT statement I am trying to get the last day with data per month for the last year.
Example, I am easily able to get the last day of each month and join that to my data table, but the problem is, if the last day of the month does not have data, then there is no returned data. What I need is for the SELECT to return the last day with data for the month.
This is probably easy to do, but to be honest, my brain fart is starting to hurt.
I've attached the select below that works for returning the data for only the last day of the month for the last 12 months.
Thanks in advance for your help!
SELECT fd.cust_id,fd.server_name,fd.instance_name,
TRUNC(fd.coll_date) AS coll_date,fd.column_name
FROM super_table fd,
(SELECT TRUNC(daterange,'MM')-1 first_of_month
FROM (
select TRUNC(sysdate-365,'MM') + level as DateRange
from dual
connect by level<=365)
GROUP BY TRUNC(daterange,'MM')) fom
WHERE fd.cust_id = :CUST_ID
AND fd.coll_date > SYSDATE-400
AND TRUNC(fd.coll_date) = fom.first_of_month
GROUP BY fd.cust_id,fd.server_name,fd.instance_name,
TRUNC(fd.coll_date),fd.column_name
ORDER BY fd.server_name,fd.instance_name,TRUNC(fd.coll_date)
You probably need to group your data so that each month's data is in the group, and then within the group select the maximum date present. The sub-query might be:
SELECT MAX(coll_date) AS last_day_of_month
FROM Super_Table AS fd
GROUP BY YEAR(coll_date) * 100 + MONTH(coll_date);
This presumes that the functions YEAR() and MONTH() exist to extract the year and month from a date as an integer value. Clearly, this doesn't constrain the range of dates - you can do that, too. If you don't have the functions in Oracle, then you do some sort of manipulation to get the equivalent result.
Using information from Rhose (thanks):
SELECT MAX(coll_date) AS last_day_of_month
FROM Super_Table AS fd
GROUP BY TO_CHAR(coll_date, 'YYYYMM');
This achieves the same net result, putting all dates from the same calendar month into a group and then determining the maximum value present within that group.
Here's another approach, if ANSI row_number() is supported:
with RevDayRanked(itemDate,rn) as (
select
cast(coll_date as date),
row_number() over (
partition by datediff(month,coll_date,'2000-01-01') -- rewrite datediff as needed for your platform
order by coll_date desc
)
from super_table
)
select itemDate
from RevDayRanked
where rn = 1;
Rows numbered 1 will be nondeterministically chosen among rows on the last active date of the month, so you don't need distinct. If you want information out of the table for all rows on these dates, use rank() over days instead of row_number() over coll_date values, so a value of 1 appears for any row on the last active date of the month, and select the additional columns you need:
with RevDayRanked(cust_id, server_name, coll_date, rk) as (
select
cust_id, server_name, coll_date,
rank() over (
partition by datediff(month,coll_date,'2000-01-01')
order by cast(coll_date as date) desc
)
from super_table
)
select cust_id, server_name, coll_date
from RevDayRanked
where rk = 1;
If row_number() and rank() aren't supported, another approach is this (for the second query above). Select all rows from your table for which there's no row in the table from a later day in the same month.
select
cust_id, server_name, coll_date
from super_table as ST1
where not exists (
select *
from super_table as ST2
where datediff(month,ST1.coll_date,ST2.coll_date) = 0
and cast(ST2.coll_date as date) > cast(ST1.coll_date as date)
)
If you have to do this kind of thing a lot, see if you can create an index over computed columns that hold cast(coll_date as date) and a month indicator like datediff(month,'2001-01-01',coll_date). That'll make more of the predicates SARGs.
Putting the above pieces together, would something like this work for you?
SELECT fd.cust_id,
fd.server_name,
fd.instance_name,
TRUNC(fd.coll_date) AS coll_date,
fd.column_name
FROM super_table fd,
WHERE fd.cust_id = :CUST_ID
AND TRUNC(fd.coll_date) IN (
SELECT MAX(TRUNC(coll_date))
FROM super_table
WHERE coll_date > SYSDATE - 400
AND cust_id = :CUST_ID
GROUP BY TO_CHAR(coll_date,'YYYYMM')
)
GROUP BY fd.cust_id,fd.server_name,fd.instance_name,TRUNC(fd.coll_date),fd.column_name
ORDER BY fd.server_name,fd.instance_name,TRUNC(fd.coll_date)