Determining total of consecutive values using SQL - sql

I would like to determine the number of consecutive absences as per the following table. Initial research suggests I may be able to achieve this using a window function. For the data provided, the longest streak is four consecutive occurrences. Please can you advise how I can set a running absence total as a separate column.
create table events (eventdate date, absence int);
insert into events values ('2014-10-01', 0);
insert into events values ('2014-10-08', 1);
insert into events values ('2014-10-15', 1);
insert into events values ('2014-10-22', 0);
insert into events values ('2014-11-05', 0);
insert into events values ('2014-11-12', 1);
insert into events values ('2014-11-19', 1);
insert into events values ('2014-11-26', 1);
insert into events values ('2014-12-03', 1);
insert into events values ('2014-12-10', 0);

Based on Gordon Linhoff's answer here, you could do:
SELECT TOP 1
MIN(eventdate) AS spanStart ,
MAX(eventdate) AS spanEnd,
COUNT(*) AS spanLength
FROM ( SELECT e.* ,
( ROW_NUMBER() OVER ( ORDER BY eventdate )
- ROW_NUMBER() OVER ( PARTITION BY absence ORDER BY eventdate ) ) AS grp
FROM #events e
) t
GROUP BY grp ,
absence
HAVING absence = 1
ORDER BY COUNT(*) DESC;
Which returns:
spanStart | spanEnd | spanLength
---------------------------------------
2014-11-12 |2014-12-03 | 4

You don't specify which RDBMS you are using, but the following works under postgresql's window functions and should be translatable to similar SQL engines:
SELECT eventdate,
absence,
-- XXX We take advantage of the fact that absence is an int (1 or 0)
-- otherwise we'd COUNT(1) OVER (...) and only conditionally
-- display the count if absence = 1
SUM(absence) OVER (PARTITION BY span ORDER BY eventdate)
AS consecutive_absences
FROM (SELECT spanstarts.*,
SUM(newspan) OVER (ORDER BY eventdate) AS span
FROM (SELECT events.*,
CASE LAG(absence) OVER (ORDER BY eventdate)
WHEN absence THEN NULL
ELSE 1 END AS newspan
FROM events)
spanstarts
) eventsspans
ORDER BY eventdate;
which gives you:
eventdate | absence | consecutive_absences
------------+---------+----------------------
2014-10-01 | 0 | 0
2014-10-08 | 1 | 1
2014-10-15 | 1 | 2
2014-10-22 | 0 | 0
2014-11-05 | 0 | 0
2014-11-12 | 1 | 1
2014-11-19 | 1 | 2
2014-11-26 | 1 | 3
2014-12-03 | 1 | 4
2014-12-10 | 0 | 0
There is an excellent dissection of the above approach on the pgsql-general mailing list. The short of it is:
Innermost query (spanstarts) uses LAG to find the start of new
spans of absences, whether a span of 1's or a span 0's
Next query (eventsspans) identifies those spans by summing the number of new spans that have come before us. So, we find span 1, then span 2, then 3, etc.
The outer query the counts the number of absences in each span.
As the SQL comment says, we cheat a little bit on #3, taking advantage of its data type, but the net effect is the same.

I don't know what your DBMS is but this is from SQLServer. Hopefully it is of some help : )
-------------------------------------------------------------------------------------------
Query:
--tableRN is used to get the rownumber
;with tableRN as (SELECT a.*, ROW_NUMBER() OVER (ORDER BY a.event) as rn, COUNT(*) as maxRN
FROM absence a GROUP BY a.event, a.absence),
--cte is a recursive function that returns the...
--absence value, the level (amount of times 1 appeared in a row)
--rn (row number), total (total count
cte (absence, level, rn, total) AS (
SELECT 0, 0, 1, 0
UNION ALL
SELECT r.absence,
CASE WHEN c.absence = 1 AND r.absence = 1 THEN level + 1
ELSE 0
END,
c.rn + 1,
CASE WHEN c.level = 1 THEN total + 1
ELSE total
END
FROM cte c JOIN tableRN r ON c.rn + 1 = r.rn)
--This gets you the total count of times there
--was a consective absent (twice or more in a row).
SELECT MAX(c.total) AS Count FROM cte c
-------------------------------------------------------------------------------------------
Results:
|Count|
+-----+
| 2 |

Create a new column called consecutive_absence_count with default 0.
You may write a SQL procedure for insert - Fetch the latest record, retrieve the absence value, identify if the new record to be inserted has a present or an absent value.
If they latest and the new record have consecutive dates and absence value set to 0, increment the consecutive_absence_count else set it to 0.

Related

Optimize the query of weekday statistics between two dates

I have a table with two fields: start_date and end_date. Now I want to count the total number of work overtime. I have created a new calendar table to maintain the working day status of the date.
table: workdays
id status
2020-01-01 4
2020-01-02 1
2020-01-03 1
2020-01-04 2
4: holiday, 1: weekday, 2: weekend
I created a function to calculate the weekdays between two dates (excluding weekends, holidays).
create or replace function get_workday_count (start_date in date, end_date in date)
return number is
day_count int;
begin
select count(0) into day_count from WORKDAYS
where TRUNC(ID) >= TRUNC(start_date)
and TRUNC(ID) <= TRUNC(end_date)
and status in (1, 3, 5);
return day_count;
end;
When I execute the following query statement, it takes about 5 minutes to display the results, erp_sj table has about 200000 rows of data.
select count(0) from ERP_SJ GET_WORKDAY_COUNT(start_date, end_date) > 5;
The fields used in query statements are indexed.
How to optimize? Or is there a better solution?
First of all, optimizing your function:
1.adding pragma udf (for faster execution in sql
2. Adding deterministic clause(for caching)
3. Replacing count(0) to count(*) (to allow cbo optimize count)
4. Replacing return number to int
create or replace function get_workday_count (start_date in date, end_date in date)
return int deterministic is
pragma udf;
day_count int;
begin
select count(*) into day_count from WORKDAYS w
where w.ID >= TRUNC(start_date)
and w.ID <= TRUNC(end_date)
and status in (1, 3, 5);
return day_count;
end;
Then you don't need to call your function in case of (end_date - start_date) < required number of days. Moreover, ideally it would be to use scalar subquery instead of function:
select count(*)
from ERP_SJ
where
case
when trunc(end_date) - trunc(start_date) > 5
then GET_WORKDAY_COUNT(trunc(start_date) , trunc(end_date))
else 0
end > 5
Or using subquery:
select count(*)
from ERP_SJ e
where
case
when trunc(end_date) - trunc(start_date) > 5
then (select count(*) from WORKDAYS w
where w.ID >= TRUNC(e.start_date)
and w.ID <= TRUNC(e.end_date)
and w.status in (1, 3, 5))
else 0
end > 5
WORKDAY_STATUSES table (just for completeness, not used below):
create table workday_statuses
( status number(1) constraint workday_statuses_pk primary key
, status_name varchar2(10) not null constraint workday_status_name_uk unique );
insert all
into workday_statuses values (1, 'Weekday')
into workday_statuses values (2, 'Weekend')
into workday_statuses values (3, 'Unknown 1')
into workday_statuses values (4, 'Holiday')
into workday_statuses values (5, 'Unknown 2')
select * from dual;
WORKDAYS table: one row for each day in 2020:
create table workdays
( id date constraint workdays_pk primary key
, status references workday_statuses not null )
organization index;
insert into workdays (id, status)
select date '2019-12-31' + rownum
, case
when to_char(date '2019-12-31' + rownum, 'Dy', 'nls_language = English') like 'S%' then 2
when date '2019-12-31' + rownum in
( date '2020-01-01', date '2020-04-10', date '2020-04-13'
, date '2020-05-08', date '2020-05-25', date '2020-08-31'
, date '2020-12-25', date '2020-12-26', date '2020-12-28' ) then 4
else 1
end
from xmltable('1 to 366')
where date '2019-12-31' + rownum < date '2021-01-01';
ERP_SJ table containing 30K rows with random data:
create table erp_sj
( id integer generated always as identity
, start_date date not null
, end_date date not null
, filler varchar2(100) );
insert into erp_sj (start_date, end_date, filler)
select dt, dt + dbms_random.value(0,7), dbms_random.string('x',100)
from ( select date '2019-12-31' + dbms_random.value(1,366) as dt
from xmltable('1 to 30000') );
commit;
get_workday_count() function:
create or replace function get_workday_count
( start_date in date, end_date in date )
return integer
deterministic -- Cache some results
parallel_enable -- In case you want to use it in parallel queries
as
pragma udf; -- Tell compiler to optimise for SQL
day_count integer;
begin
select count(*) into day_count
from workdays w
where w.id between trunc(start_date) and end_date
and w.status in (1, 3, 5);
return day_count;
end;
Notice that you should not truncate w.id, because all values have the time as 00:00:00 already. (I'm assuming that if end_date falls somewhere in the middle of a day, you want to count that day, so I have not truncated the end_date parameter.)
Test:
select count(*) from erp_sj
where get_workday_count(start_date, end_date) > 5;
COUNT(*)
--------
1302
Results returned in around 1.4 seconds.
Execution plan for the query within the function:
select count(*)
from workdays w
where w.id between trunc(sysdate) and sysdate +10
and w.status in (1, 3, 5);
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 1 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 1 |
|* 2 | FILTER | | 1 | | 7 |00:00:00.01 | 1 |
|* 3 | INDEX RANGE SCAN| WORKDAYS_PK | 1 | 7 | 7 |00:00:00.01 | 1 |
--------------------------------------------------------------------------------------------
Now try adding the function as a virtual column and indexing it:
create index erp_sj_workday_count_ix on erp_sj(workday_count);
select count(*) from erp_sj
where workday_count > 5;
Same result in 0.035 seconds. Plan:
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 5 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 5 |
|* 2 | INDEX RANGE SCAN| ERP_SJ_WORKDAY_COUNT_IX | 1 | 1302 | 1302 |00:00:00.01 | 5 |
-------------------------------------------------------------------------------------------------------
Tested in 19.0.0.
Edit: As Sayan pointed out, the index on the virtual column won't be automatically updated if there are any changes in WORKDAYS, so there is a risk of wrong results with this approach. However, if performance is critical you could work around it by rebuilding the index on ERP_SJ every time you updated WORKDAYS. Maybe you could do this in a statement-level trigger on WORKDAYS, or just through scheduled IT maintenance processes if updates are very infrequent and ERP_SJ isn't so big that an index rebuild is impractical. If the index is partitioned, rebuilding affected partitions could be an option.
Or, don't have an index and live with the 1.4 seconds query execution time.
I understand that the columns ID and status have indexes on them ( not functional index on TRUNC(ID) ). So use this query
SELECT count(0)
INTO day_count
FROM WORKDAYS
WHERE ID BETWEEN TRUNC(start_date) AND TRUNC(end_date)
AND status in (1, 3, 5);
in order to be able to exploit the index on date column ID also.
May be try Scalar Subquery Caching
(in case there are plenty erp_sj records with the same start_date and end_date)
select count(0) from ERP_SJ where
(select GET_WORKDAY_COUNT(start_date, end_date) from dual) > 5
You are dealing with a data warehouse query (not an OLTP query).
Some best practices says you should
get rid od functions - avoid contenxt switch (this could be somehow mitigated with the UDF pragma but why to use function if you don't need it?)
get rid of indexes - quick for few rows; slow for large number of records
get rid of caching - caching is basically a workaround for repeating same thing
So the data warehouse approach for the problem consists of two steps
Extend the Workday Table
The workday table can be with a little query extended with a new column MIN_END_DAY that defines for each (start) day the minimum threshold to reach the limit of 5 working days.
The query uses LEAD aggregate function to get the 4th leading working day (check the PARTITION BY clause that distincs between the working ays and other days.
For the non working days you simple takes the LAST_VALUE of the next working day.
Example
with wd as (
select ID, STATUS,
case when status in (1, 3, 5) then
lead(id,4) over (partition by case when status in (1, 3, 5) then 'workday' end order by id) /* 4 working days ahead */
end as min_stop_day
from workdays),
wd2 as (
select ID, STATUS,
last_value(MIN_STOP_DAY) ignore nulls over (order by id desc) MIN_END_DAY
from wd)
select ID, STATUS, MIN_END_DAY
from wd2
order by 1;
ID, STATUS, MIN_END_DAY
01.01.2020 00:00:00 4 08.01.2020 00:00:00
02.01.2020 00:00:00 1 08.01.2020 00:00:00
03.01.2020 00:00:00 1 09.01.2020 00:00:00
04.01.2020 00:00:00 2 10.01.2020 00:00:00
05.01.2020 00:00:00 2 10.01.2020 00:00:00
06.01.2020 00:00:00 1 10.01.2020 00:00:00
Join to the Base Table
Now you can simple join your base table with the the extended workday table on the start_day and filter rows by comparing the end_daywith the MIN_END_DAY
Query
with wd as (
select ID, STATUS,
case when status in (1, 3, 5) then
lead(id,4) over (partition by case when status in (1, 3, 5) then 'workday' end order by id)
end as min_stop_day
from workdays),
wd2 as (
select ID, STATUS,
last_value(MIN_STOP_DAY) ignore nulls over (order by id desc) MIN_END_DAY
from wd)
select count(*) from erp_sj
join wd2
on trunc(erp_sj.start_date) = wd2.ID
where trunc(end_day) >= min_end_day
This will lead for large tables to the expected HASH JOIN execution plan.
Note that I assume 1) the workday table is complete (otherwise you can't use inner join) and 2) contains enough future data (the last 5 rows are obviously not usable).

getting count based on sum of values in other columns SQL

PID | ChildPID | value
------|-----------|-------
3835 | 3934 | 1
3835 | 3935 | 0
3835 | 3936 | 0
3835 | 3939 | 1
3836 | 3940 | 0
3836 | 3941 | 0
3836 | 3942 | 0
and i need results like
PIDCountinvalue| Childcountinvalue | PIDCountoutvalue| Childcountoutvalue
---------------|--------------------|-----------------|-------------------|
1 | 2 | 1 | 5
means i need get count of PID,ChildPID based on sum of value corresponding those and if child Id belongs to one PID having complete values as 0 then only PID will get count in PIDCountoutvalue column or else if it's having >0 after summing up all ChildPID of that PID i will be considered as PIDCountinvalue, and coming to Childcount in/out it's just based on corresponding values.
explanation:
PIDCountinvalue| Childcountinvalue | PIDCountoutvalue| Childcountoutvalue
---------------|--------------------|-----------------|-------------------|
1(3835) | 2 (3934,3939) | 1 (3836) |5(3935,3936,3940,3941,3942)
there are total two PID: (3835,3836) and PID:3835 having 4 childids (3934,3935,3936,3939) (sum of values of childs > 0) and PID:3836 having 4 childids (3940,3941,3942) (sum of values of childs = 0), so if you sum the values of childIds under there respective PId's if that sum of the value count=0 then that corresponding PID will count as PIDCountoutvalue else it wil come under PIDCountinvalue like that 3836 is in outvalue count and 3835 is in value count.
Your explanations are not very clear and I cannot catch your logic, but here's an example code that can help you. You can easily modify it to fit your needs.
CREATE TABLE Table1
([PID] varchar(6), [ChildPID] varchar(11), [value] int)
;
INSERT INTO Table1
([PID], [ChildPID], [value])
VALUES
('3835', '3934', '1'),
('3835', '3935', '0'),
('3835', '3936', '0'),
('3835', '3939', '1'),
('3836', '3940', '0'),
('3836', '3941', '0'),
('3836', '3942', '0')
;
--You can use a common table expression to "build" a table and then to query what you need.
with
cte as
(
select PID, ChildPID, sum(value) [Sum]
from Table1
group by PID, ChildPID
)
--Now just build you columns with the aggregated data you need.
select
(select
count(distinct PID)
from cte
where [Sum] > 0) as PIDCountinvalue --Get count of PIDs with sum of values more than 0
,(select
count(ChildPID)
from cte
where [Sum] > 0) as Childcountinvalue --Get count of ChildPIDs that have values larger than 0
,(select sum(I.PIDs) from
(select
count(PID) PIDs
from cte
group by PID
having sum([Sum]) <= 0) I) as PIDCountoutvalue --Get count of PIDs with sum of value less than 0
,(select
count(ChildPID)
from cte
where [Sum] <= 0) as Childcountoutvalue --Get count of ChildPIDs that have values less than 0
The output of that query is exactly what you've wanted. However, since your logic is not clear to me, this may not be the final solution for your complete data. But I hope it helps at least a bit.

Unable to calculate difference between CTE subquery outputs for use in larger PostgreSQL query output column

Using PostgreSQL v9.4.5 from Shell I created a database called moments in psql by running create database moments. I then created a moments table:
CREATE TABLE moments
(
id SERIAL4 PRIMARY KEY,
moment_type BIGINT NOT NULL,
flag BIGINT NOT NULL,
time TIMESTAMP NOT NULL,
UNIQUE(moment_type, time)
);
INSERT INTO moments (moment_type, flag, time) VALUES (1, 7, '2016-10-29 12:00:00');
INSERT INTO moments (moment_type, flag, time) VALUES (1, -30, '2016-10-29 13:00:00');
INSERT INTO moments (moment_type, flag, time) VALUES (3, 5, '2016-10-29 14:00:00');
INSERT INTO moments (moment_type, flag, time) VALUES (2, 9, '2016-10-29 18:00:00');
INSERT INTO moments (moment_type, flag, time) VALUES (2, -20, '2016-10-29 17:00:00');
INSERT INTO moments (moment_type, flag, time) VALUES (3, 10, '2016-10-29 16:00:00');
I run select * from moments to view the table:
Moments Table
id | moment_type | flag | time
----+-------------+------+---------------------
1 | 1 | 7 | 2016-10-29 12:00:00
2 | 1 | -30 | 2016-10-29 13:00:00
3 | 3 | 5 | 2016-10-29 14:00:00
4 | 2 | 9 | 2016-10-29 18:00:00
5 | 2 | -20 | 2016-10-29 17:00:00
6 | 3 | 10 | 2016-10-29 16:00:00
I then try to write an SQL query that produces the following output, whereby for each pair of duplicate moment_type values it returns the difference between the flag value of the moment_type having the most recent timestamp value, and the flag value of the second most recent timestamp value, and lists the results in ascending order by moment_type.
Expected SQL Query Output
moment_type | flag |
------------+------+
1 | -37 | (i.e. -30 - 7)
2 | 29 | (i.e. 9 - -20)
3 | 5 | (i.e. 10 - 5)
The SQL query that I came up with is as follows, which uses the WITH query to write multiple Common Table Expressions (CET) subqueries for use as temporary tables in the larger SELECT query at the end. I also use an SQL function to calculate the difference between two of the subquery outputs (alternatively I think I could have just used DIFFERENCE DIFFERENCE(most_recent_flag, second_most_recent_flag) AS flag instead of the function):
CREATE FUNCTION difference(most_recent_flag, second_most_recent_flag) RETURNS numeric AS $$
SELECT $1 - $2;
$$ LANGUAGE SQL;
-- get two flags that have the most recent timestamps
WITH two_most_recent_flags AS (
SELECT moments.flag
FROM moments
ORDER BY moments.time DESC
LIMIT 2
),
-- get one flag that has the most recent timestamp
most_recent_flag AS (
SELECT *
FROM two_most_recent_flags
ORDER BY flag DESC
LIMIT 1
),
-- get one flag that has the second most recent timestamp
second_most_recent_flag AS (
SELECT *
FROM two_most_recent_flags
ORDER BY flag ASC
LIMIT 1
)
SELECT DISTINCT ON (moments.moment_type)
moments.moment_type,
difference(most_recent_flag, second_most_recent_flag) AS flag
FROM moments
ORDER BY moment_type ASC
LIMIT 2;
But when I run the above SQL query in PostgreSQL, it returns the following error:
ERROR: column "most_recent_flag" does not exist
LINE 21: difference(most_recent_flag, second_most_recent_flag) AS fla...
Question
What techniques can I use and how may I apply them to overcome this error, and calculate and display the differences in the flag column to achieve the Expected SQL Query Output?
Note: Perhaps the Window Function may be used somehow as it performs calculations across table rows
Use the lag() window function:
select moment_type, difference
from (
select *, flag- lag(flag) over w difference
from moments
window w as (partition by moment_type order by time)
) s
where difference is not null
order by moment_type
moment_type | difference
-------------+------------
1 | -37
2 | 29
3 | 5
(3 rows)
One method is to use conditional aggregation. The window function row_number() can be used to identify the first and last time values:
select m.moment_type,
(max(case when seqnum_desc = 1 then flag end) -
min(case when seqnum_asc = 1 then flag end)
)
from (select m.*,
row_number() over (partition by m.moment_type order by m.time) as seqnum_asc,
row_number() over (partition by m.moment_type order by m.time desc) as seqnum_desc
from moments m
) m
group by m.moment_type;

Deterministic sort order for window functions

I've a status table and I want to fetch the latest details.
Slno | ID | Status | date
1 | 1 | Pass | 15-06-2015 11:11:00 - this is inserted first
2 | 1 | Fail | 15-06-2015 11:11:00 - this is inserted second
3 | 2 | Fail | 15-06-2015 12:11:11 - this is inserted first
4 | 2 | Pass | 15-06-2015 12:11:11 - this is inserted second
I use a window function with partition by ID order by date desc to fetch the first value.
Excepted Output :
2 | 1 | Fail | 15-06-2015 11:11:00 - this is inserted second
4 | 2 | Pass | 15-06-2015 12:11:11 - this is inserted second
Actual Output :
1 | 1 | Pass | 15-06-2015 11:11:00 - this is inserted first
3 | 2 | Fail | 15-06-2015 12:11:11 - this is inserted first
According to [http://docs.aws.amazon.com/redshift/latest/dg/r_Examples_order_by_WF.html], adding a second ORDER BY column to the window function may solve the problem. But I don't have any other column to differentiate the rows!
Is there another approach to solve the issue?
EDIT: I've added slno here for clarity. I don't have slno as such in the table!
My SQL:
with range as (
select id from status where date between 01-06-2015 and 30-06-2015
), latest as (
select status, id, row_number() OVER (PARTITION BY id ORDER BY date DESC) row_num
)
select * from latest where row_num = 1
If you don't have slno in your table, then you don't have any reliable information which row was inserted first. There is no natural order in a table, the physical order of rows can change any time (with any update, or with VACUUM, etc.)
You could use an unreliable trick: order by the internal ctid.
select *
from (
select id, status
, row_number() OVER (PARTITION BY id
ORDER BY date, ctid) AS row_num
from status -- that's your table name??
where date >= '2015-06-01' -- assuming column is actually a date
and date < '2015-07-01'
) sub
where row_num = 1;
In absence of any other information which row came first (which is a design error to begin with, fix it!), you might try to save what you can using the internal tuple ID ctid
In-order sequence generation
Rows will be in physical order when inserted initially, but that can change any time with any write operation to the table or VACUUM or other events.
This is a measure of last resort and it will break.
Your presented query was invalid on several counts: missing column name in 1st CTE, missing table name in 2nd CTE, ...
You don't need a CTE for this.
Simpler with DISTINCT ON (considerations for ctid apply the same):
SELECT DISTINCT ON (id)
id, status
FROM status
WHERE date >= '2015-06-01'
AND date < '2015-07-01'
ORDER BY id, date, ctid;
Select first row in each GROUP BY group?

How to create end date that is one day less than the next start date created by another another query with sql?

I queried off of a table that pulls in anyone who has working time percentage of less than 100 and all their working time records if they met the less than 100 criteria.
This table contains the columns: id, eff_date (of working time percentage), and percentage. This table does not contain end_date.
Problem: how to build on top of the query below and add a new column called end_date that is one date less than the next eff_date?
Current query
select
j1.id, j1.eff_date, j1.percentage
from
working_time_table j1
where
exists (select 1
from working_time_table j2
where j2.id = j1.id and j2.percentage < 100)
Data returned from the query above:
ID | EFF_DATE| PERCENTAGE
------------------------
12 | 01-JUN-2012 | 70
12 | 03-MAR-2013 | 100
12 | 13-DEC-2014 | 85
The desired result set is:
ID | EFF_DATE | PERCENTAGE | END_DATE
-------------------------------------------
12 | 01-JUN-2012 | 70 | 02-MAR-2013
12 | 03-MAR-2013 | 100 | 12-DEC-2014
12 | 13-DEC-2014 | 85 | null
You didn't state your DBMS so this is ANSI SQL using window functions:
select j1.id,
j1.eff_date,
j1.percentage,
lead(j1.eff_date) over (partition by j1.id order by j1.eff_date) - interval '1' day as end_date
from working_time_table j1
where exists (select 1
from working_time_table j2
where j2.id = j1.id and j2.percentage < 100);
First off, curious if the "id" column is unique or it has duplicate values like the 12's in your sample, or is that a unique column or primary key possibly. It would be WAAAAY easier to do this if there was a unique
id column that held the order. If you don't have a unique ID column,
are you able to add one to the table? Again, would simplify this
tremendously.
This took forever to get right, I hope this helps, burned many hours on it.
Props to Akhil for helping me finally get the query right. He is a true SQL genius.
Here is the ..
SQLFIDDLE
SELECT
id,
firstTbl.eff_Date,
UPPER(DATE_FORMAT(DATE_SUB(
STR_TO_DATE(secondTbl.eff_Date, '%d-%M-%Y'),
INTERVAL 1 DAY), '%d-%b-%Y')) todate,
percentage FROM
(SELECT
(#cnt := #cnt + 1) rownum,
id, eff_date, percentage
FROM working_time_table,
(SELECT
#cnt := 0) s) firstTbl
LEFT JOIN
(SELECT
(#cnt1 := #cnt1 + 1) rownum,
eff_date
FROM working_time_table,
(SELECT
#cnt1 := 0) s) secondTbl
ON (firstTbl.rownum + 1) = secondTbl.rownum