Comparing consecutive rows using oracle - sql

I have a table which looks something like this:
| ID | FROM_DATE | TO_DATE |
------------------------------
| 1 | 1/1/2001 | 2/1/2001|
| 1 | 2/1/2001 | 3/1/2001|
| 1 | 2/1/2001 | 6/1/2001|
| 1 | 3/1/2001 | 4/1/2001|
| 2 | 1/1/2001 | 2/1/2001|
| 2 | 1/1/2001 | 6/1/2001|
| 2 | 2/1/2001 | 3/1/2001|
| 2 | 3/1/2001 | 4/1/2001|
It is already sorted by ID, From_Date, To_date.
What I want to do is delete the rows where the from_date is earlier than the to_date from the previous line and the ID is equal to the ID from the previous line. So in this example, I would delete the 3rd and 6th rows only.
I know I need some kind of looping structure to accomplish this, but I don't know how since I'm really looking at two rows at a time here. How can I accomplish this within Oracle?
EDIT: Where using the 'LAG' function is quicker and easier, I end up deleting the 4th and 7th rows also - which is not what I want to do. For example, when it gets to row 4, it should compare the 'from_date' to the 'to_date' from row 2 (instead of row 3, because row 3 should be deleted).

You could use the lag window function to identify these rows:
DELETE FROM mytable
WHERE rowid IN (SELECT rowid
FROM (SELECT rowid, from_date,
LAG(to_date) OVER
(PARTITION BY id
ORDER BY from_date, to_date)
AS lag_to_date
FROM my_table) t
WHERE from_date < lag_to_date)

Related

How to find two consecutive rows sorted by date, containing a specific value?

I have a table with the following structure and data in it:
| ID | Date | Result |
|---- |------------ |-------- |
| 1 | 30/04/2020 | + |
| 1 | 01/05/2020 | - |
| 1 | 05/05/2020 | - |
| 2 | 03/05/2020 | - |
| 2 | 04/05/2020 | + |
| 2 | 05/05/2020 | - |
| 2 | 06/05/2020 | - |
| 3 | 01/05/2020 | - |
| 3 | 02/05/2020 | - |
| 3 | 03/05/2020 | - |
| 3 | 04/05/2020 | - |
I'm trying to write an SQL query (I'm using SQL Server) which returns the date of the first two consecutive negative results for a given ID.
For example, for ID no. 1, the first two consecutive negative results are on 01/05 and 05/05.
The first two consecutive results for ID No. 2 are on 05/05 and 06/05.
The first two consecutive negative results for ID No. 3 are on on 01/05 and 02/05 .
So the query should produce the following result:
| ID | FirstNegativeDate |
|---- |------------------- |
| 1 | 01/05 |
| 2 | 05/05 |
| 3 | 01/05 |
Please note that the dates aren't necessarily one day apart. Sometimes, two consecutive negative tests may be several days apart. But they should still be considered as "consecutive negative tests". In other words, two negative tests are not 'consecutive' only if there is a positive test result in between them.
How can this be done in SQL? I've done some reading and it looks like maybe the PARTITION BY statement is required but I'm not sure how it works.
This is a gaps-and-island problem, where you want the start of the first island of '-'s that contains at least two rows.
I would recommend lead() and aggregation:
select id, min(date) first_negative_date
from (
select t.*, lead(result) over(partition by id order by date) lead_result
from mytable t
) t
where result = '-' and lead_result = '-'
group by id
Use LEAD or LAG functions over ID partition ordered by your Date column.
Then simple check where LEAD/LAG column is equal to Result.
You'll need also to filter the top ones.
The image attached just shows what LEAD/LAG would return

Comparing two tables that are the same and listing out the max date

I was wondering if it's possible to compare dates within the same table with same ID, but the catch is that there is an additional column that display the status. For instance, here's a table A:
The results I would like to see is this:
I know I could use a group by and max aggregate with ID to find the max date; however, I would like the status (Running/Stopped) column associated to be there. It would help me a lot.
In most databases, the fastest method (assuming the right indexes) is a correlated subquery:
select t.*
from t
where t.date = (select max(t2.date) from t t2 where t2.id = t.id);
Even if not the fastest, this should work in any database.
In case of Oracle, you can use the KEEP clause like this:
SELECT t.id,
MAX(t.status) KEEP (DENSE_RANK LAST ORDER BY t."DATE") AS corresponding_status,
MAX(t."DATE") AS last_date
FROM tab t
GROUP BY t.id
ORDER BY 1
For this sample data:
+----+---------+------------+
| ID | STATUS | DATE |
+----+---------+------------+
| 1 | Running | 2018-02-03 |
| 1 | Stopped | 2018-04-04 |
| 2 | Running | 2018-03-24 |
| 2 | Stopped | 2018-01-02 |
| 3 | Running | 2018-06-12 |
| 3 | Stopped | 2018-06-12 |
+----+---------+------------+
This would return this result:
+----+----------------------+------------+
| ID | CORRESPONDING_STATUS | LAST_DATE |
+----+----------------------+------------+
| 1 | Stopped | 2018-04-04 |
| 2 | Running | 2018-03-24 |
| 3 | Stopped | 2018-06-12 |
+----+----------------------+------------+
As can be seen in this SQL Fiddle.
For the cases, when you have multiple entries on the same ID and DATE combination, it'll choose one STATUS value - in this case the last one (based on alphanumerical sorting), as I've used MAX on the STATUS.
The part LAST ORDER BY t."DATE" corresponds to how we choose DATE value in the group, i.e. by choosing the last DATE in the group.
See this Oracle Docs entry on more details.

Trying to create a Teradata view to aggregate how long rows of a specific ID have had a certain value

I have a test report table, that writes a row after each run of a test.
Let's say this is the data:
| main_id | status | date |
|---------|--------|---------|
| 123 | pass | Jan 1st |
| 123 | fail | Jan 2nd |
| 123 | fail | Jan 3rd |
| 123 | fail | Jan 4th |
I want to make a view that for each test, will list how long it has been failing.
Essentially, the corresponding row for the above data would look like this:
| main_id | days_failing |
|---------|--------------|
| 123 | 3 |
Using Teradata SQL, how could check each row in the source table, looking for the last success, and then sum up all the subsequent failures?
Edit: Note that there would be many different "main_id"s in the source table, I would need 1 row in the view for every unique failing test in the source table.
Thanks
select main_id
,count (*) - 1 as days_failing
from (select main_id
,"date"
from t
qualify "date" >= max (case status when 'pass' then "date" end) over (partition by main_id)
) t
group by main_id
order by main_id
;

SQL deleting rows with duplicate dates conditional upon values in two columns

I have data on approx 1000 individuals, where each individual can have multiple rows, with multiple dates and where the columns indicate the program admitted to and a code number.
I need each row to contain a distinct date, so I need to delete the rows of duplicate dates from my table. Where there are multiple rows with the same date, I need to keep the row that has the lowest code number. In the case of more than one row having both the same date and the same lowest code, then I need to keep the row that also has been in program (prog) B. For example;
| ID | DATE | CODE | PROG|
--------------------------------
| 1 | 1996-08-16 | 24 | A |
| 1 | 1997-06-02 | 123 | A |
| 1 | 1997-06-02 | 123 | B |
| 1 | 1997-06-02 | 211 | B |
| 1 | 1997-08-19 | 67 | A |
| 1 | 1997-08-19 | 23 | A |
So my desired output would look like this;
| ID | DATE | CODE | PROG|
--------------------------------
| 1 | 1996-08-16 | 24 | A |
| 1 | 1997-06-02 | 123 | B |
| 1 | 1997-08-19 | 23 | A |
I'm struggling to come up with a solution to this, so any help greatly appreciated!
Microsoft SQL Server 2012 (X64)
The following works with your test data
SELECT ID, date, MIN(code), MAX(prog) FROM table
GROUP BY date
You can then use the results of this query to create a new table or populate a new table. Or to delete all records not returned by this query.
SQLFiddle http://sqlfiddle.com/#!9/0ebb5/5
You can use min() function: (See the details here)
select ID, DATE, min(CODE), max(PROG)
from table
group by DATE
I assume that your table has a valid primary key. However i would recommend you to take IDas Primary key. Hope this would help you.

Analytic function - Comparing values using LAG()

Assume following data:
| Col1 | Col2 |
| 3 | 20-dec-15 |
| 4 | 20-dec-15 |
| 8 | 25-dec-15 |
|10 | 25-dec-15 |
I have to compare the values of column Col1 for a particular date.
For Example: For 20-dec-15 changes occured as 3 changed to 4.
I have to solve this using an analytical function.
Following is the query which I am using
decode(LAG(Col1,1,Col1) OVER (partition by Col2 order by Col2),Col1,0,1) Changes
As Col2 is date column, Partition by date is not working for me. Can we apply date column as Partition?
Expected Result should be:
| Changes |
| 0 |
| 1 |
| 0 |
| 1 |
Here 1 means Change occured while comparing for same date.
You need to use trunc() in order to reset the time part to 00:00:00 but you should still keep order by col2 so that all rows on the same day are ordered by the time part:
I also prefer an explicit case for this kind of comparison, personally I find the decode() really hard to read:
select case
when col1 = lag(col1,1,col1) over (partition by trunc(col2) order by col2) then 0
else 1
end as changes
from the_table;