I have a table like this:
Id Begin_Date End_date
1 01-JAN-12 05-JAN-12
1 01-FEB-12 01-MAR-12
1 15-FEB-12 05-MAR-12
For a given Id, it gives a set of date ranges. Let's say that if a date is between the begin and end date for that Id, then that Id is "on". Otherwise, "off"
The problem here is these last two rows -- the date ranges overlap and contradict each other. The second row claims that the 1 was "on" between 01-FEB-12 and 01-MAR-123, but the third row claims that 1 was off before before 14-FEB-12. Similarly, the second row claims that 1 was off on 02-MAR-12, but row 3 claims it was on.
The reconciliation logic I'd like to apply is that, in cases of contradictions, pick the earliest possible begin date and the earliest possible end date after it. The result would therefore be:
Id Begin_Date End_date
1 01-JAN-12 05-JAN-12
1 01-FEB-12 01-MAR-12
I was able to pull this off with the lag analytical function, but I ran into difficulty with other use cases. Take this input data set.
Id Begin_Date End_date
1 01-JAN-12 10-JAN-12
1 5-JAN-12 8-JAN-12
1 12-JAN-12 15-JAN-12
1 1-JAN-12 14-JAN-12
What I expect here as output is:
Id Begin_Date End_date
1 01-JAN-12 8-JAN-12
1 01-JAN-12 14-JAN-12
...because the first row is the earliest begin date, and its end date is the earliest end date after that. The next row is the earliest begin date after the previous end date, and the end date of that row is the earliest end date after that. There are no begin dates after 14-JAN-12, so I'm done.
I'm having very little luck solving this problem. One approach I tried was getting the rank partitioned by id and compare it to the max rank. I then used the lag function to compare to previous ranks. However, this strategy totally fails for use cases above.
Any suggestions?
Well, the critical requirement rests on this:
The reconciliation logic I'd like to apply is that, in cases of
contradictions, pick the earliest possible begin date and the earliest
possible end date after it.
sqlfiddle here
CREATE TABLE table1
(
id INT,
DateStart DATE,
DateEnd DATE
);
INSERT INTO table1
VALUES
(1, TO_DATE('20110101','YYYYMMdd'), TO_DATE('20110110','YYYYMMdd'));
INSERT INTO table1
VALUES
(2, TO_DATE('20110105','YYYYMMdd'), TO_DATE('20110108','YYYYMMdd'));
INSERT INTO table1
VALUES
(3, TO_DATE('20110112','YYYYMMdd'), TO_DATE('20110115','YYYYMMdd'));
INSERT INTO table1
VALUES
(4, TO_DATE('20110101','YYYYMMdd'), TO_DATE('20110114','YYYYMMdd'));
INSERT INTO table1
VALUES
(5, TO_DATE('20110206','YYYYMMdd'), TO_DATE('20110208','YYYYMMdd'));
INSERT INTO table1
VALUES
(6, TO_DATE('20110201','YYYYMMdd'), TO_DATE('20110207','YYYYMMdd'));
The select statement:
SELECT ID, DATESTART, DATEEND
FROM
(
SELECT ID, TYPE, DATES AS DATESTART,
LEAD(DATES) OVER (ORDER BY DATES) AS DATEEND
FROM
(
SELECT ID, TYPE,DATES,
LAG(ID) OVER (ORDER BY DATES) AS LASTID,
LAG(TYPE) OVER (ORDER BY DATES) AS LASTTYPE,
LAG(DATES) OVER (ORDER BY DATES) AS LASTDATES
FROM
(
SELECT ID,'START' AS TYPE,DATESTART AS DATES
FROM table1
UNION ALL
SELECT ID,'END',DATEEND
FROM table1
)
) H
WHERE TYPE != LASTTYPE OR LASTTYPE IS NULL
)
WHERE TYPE = 'START'
ORDER BY DATESTART
Here's a step by step for each subquery:
explode each row's date start and date end into one column
copy the last row using LAG and put it in current row
filter out the rows which is are in the middle (e.g. 1,2,3,4 remove 2,3)
get the end date in the next row because these are either first or last rows
extract only useful rows, those rows which has TYPE = START
For the second data set:
Id Begin_Date End_date
1 01-JAN-12 10-JAN-12
1 5-JAN-12 8-JAN-12
1 12-JAN-12 15-JAN-12
1 1-JAN-12 14-JAN-12
After your reconciliation logic, the result would be:
Id Begin_Date End_date
1 01-JAN-12 8-JAN-12 (includes the rows 1,2 and 4 -> minimum begin_date is 1-JAN, minimum end_date is 8-JAN)
1 12-JAN-12 15-JAN-12 (includes row 3)
Related
I need to calculate the average number of days if there are two or more dates for each ID: the days between date1 and date2, date2 and date3 etc. The output needs to be the average number of days between each interval per ID. I am looking for a solution that iterates through each date for each ID and then averages the number of days
I could create a row number and partition by the id but in the actual data there can be up to 20 rows for each ID.
CREATE TABLE #ATABLE(
ID INTEGER NOT NULL
,DATE DATE NOT NULL
);
INSERT INTO #ATABLE(ID,DATE) VALUES (1,'1/1/2019');
INSERT INTO #ATABLE(ID,DATE) VALUES (2,'1/1/2019');
INSERT INTO #ATABLE(ID,DATE) VALUES (2,'1/10/2019');
INSERT INTO #ATABLE(ID,DATE) VALUES (2,'1/20/2019');
INSERT INTO #ATABLE(ID,DATE) VALUES (2,'1/30/2019');
INSERT INTO #ATABLE(ID,DATE) VALUES (3,'1/1/2019');
INSERT INTO #ATABLE(ID,DATE) VALUES (3,'1/10/2019');
--get avg days between orders
DROP TABLE #ATABLE
The out put for the above would be:
ID AvgDatediff
1 Null
2 10
3 9
You can use lag to get the previous row (per row), and then find the diff between it and the current row. Then, you can average them out:
SELECT id, AVG(diff)
FROM (SELECT id,
DATEDIFF(DAY, date, LAG(date) OVER (PARTITION BY id
ORDER BY date DESC)) AS diff
FROM #atable) t
GROUP BY id;
The simplest way to get the average difference is:
SELECT id, DATEDIFF(DAY, MIN(date), MAX(date)) / NULLIF(COUNT(*) - 1, 0)
FROM #atable) t
GROUP BY id;
Note: You may want a * 1.0 if you don't want an integer average.
In other words, the average difference is the latest date minus the earliest date divided by one less than the count. Try it. It works.
SELECT id, AVG(DayDiff)
FROM (
SELECT id,
DATEDIFF(dd, date, LEAD(date) OVER (PARTITION BY id ORDER BY date)) AS DayDiff
FROM #atable
) as AA
GROUP BY id;
LEAD(source_column) ==> picks the next data on basis of the order by clause i.e. here date.
I am not very good with Queries and Database.
I have the the following data table
Date ID Value
20160601 1 300
20160607 1 301
20160601 2 600
20160607 2 601
20160501 1 250
20160507 1 240
20160501 2 800
20160507 2 801
my requirement is to select the last date of a given month for each ID and show the value.
for example, If I choose month 5 the result would be:
Date ID Value
20160507 1 240
20160507 2 801
and so on based on the month the user will enter.
I know it may look simple but I am really stuck and I would appreciate some help. Thanks.
Assuming date is an actual date column (as it should be), you can use extract to compare the month value, and then the row_number() over ... analytic function to get the latest row per id value:
select date, id, value
from (select date, id, value,
row_number() over (partition by id order by date desc) as rn
from tbl
where extract(month from date) = 5)
where rn = 1
Of course, I assume that your actual date column is called something else, as date is a reserved word.
Find the maximum date then select all rows with that date.
select *
from table
where date = (select max(date) from table where date like '201605%')
I'm trying to count the number of months that have passed based on ID, it's possible that for some records the months will not increase by 1 each time (i.e. someone could have a record for 1/1/13 and 3/1/13 but not 2/1/13) however I only want a count of the records in my table. So missing months don't matter.
An example table would be: (notice the missing month and it's irrelevancy).
DATE ID Months Passed
----------- --- --------------
2013-11-01 105 1
2013-12-01 105 2
2014-02-01 105 3
2014-03-01 105 4
Essentially an Excel COUNTIFSin SQL, which I've written:
=COUNTIFS(IDColumn, ID, MonthColumn, "<=" & Month)
Does anyone know of a way to generate the desired column using SQL?
Try ROW_NUMBER(). If you just want the "Months Passed" column to increase by 1 each time, and for each ID, that will do the trick.
SELECT
Date,
Id,
Indicator,
ROW_NUMBER() OVER(PARTITION BY Id ORDER BY Date) AS RowNum
FROM YourTable
WHERE Indicator = 'YES'
UNION
SELECT
Date,
Id,
Indicator,
0 AS RowNum
FROM YourTable
WHERE Indicator = 'NO'
You could more simply count rows grouped by month (more complex if you have count months in different years separately):
SELECT COUNT(derived.monthVal)
FROM (SELECT MONTH(<your date field>) AS monthVal
FROM [your table]
WHERE [Your ID Column] = <the id>
GROUP BY MONTH(<your date field>)) AS derived;
Let's say you have the following PostgreSQL sparse table listing reservation dates:
CREATE TABLE reserved_dates (
reserved_days_id SERIAL NOT NULL,
reserved_date DATE NOT NULL
);
INSERT INTO reserved_dates (reserved_date) VALUES
('2014-10-11'),
('2014-10-12'),
('2014-10-13'),
-- gap
('2014-10-15'),
('2014-10-16'),
-- gap
('2014-10-18'),
-- gap
('2014-10-20'),
('2014-10-21');
How do you aggregate those dates into continuous date ranges (ranges without gaps)? Such as:
start_date | end_date
------------+------------
2014-10-11 | 2014-10-13
2014-10-15 | 2014-10-16
2014-10-18 | 2014-10-18
2014-10-20 | 2014-10-21
This is what I came up with so far, but I can only get start_date this way:
WITH reserved_date_ranges AS (
SELECT reserved_date,
reserved_date
- LAG(reserved_date) OVER (ORDER BY reserved_date) AS difference
FROM reserved_dates
)
SELECT *
FROM reserved_date_ranges
WHERE difference > 1 OR difference IS NULL;
SELECT min(reserved_date) AS start_date
, max(reserved_date) AS end_date
FROM (
SELECT reserved_date
, reserved_date - row_number() OVER (ORDER BY reserved_date)::int AS grp
FROM reserved_dates
) sub
GROUP BY grp
ORDER BY grp;
Compute gap-less serial numbers in chronological order with the window function row_number(). Duplicate dates are not allowed. (I added a UNIQUE constraint in the fiddle.)
If your reserved_days_id happens to be gap-less and in chronological order, you can use that directly instead. But that's typically not the case.
Subtract that from reserved_date in each row (after converting to integer). Consecutive days end up with the same date value grp - which has no other purpose or meaning than to form groups.
Aggregate in the outer query. Voilá.
db<>fiddle here
Old sqlfiddle
Similar cases:
Rank based on sequence of dates
Group by repeating attribute
I need to compare rows in the same table of a query.
Here is an example of the table:
id checkin checkout
1 01/15/13 01/31/13
1 01/31/13 05/20/13
2 01/15/13 05/20/13
3 01/15/13 01/19/13
3 01/19/13 05/20/13
4 01/15/13 02/22/13
5 01/15/13 03/01/13
I compare the checkout date to today's date, if it is before today's date then I want to return the result. However, similar to id's 1 and 3, they have multiple records. If one of the records associated with the same id have a record that has a checkout date after today's date then I don't want to return any of their records. I only want to return a record of each id where every record is before today's date in the checkout field.
For this purpose, analytic functions are the best approach:
select id, checkin, checkout
from (select t.*, max(checkout) over (partition by id) as maxco
from t
) t
where maxco <= trunc(sysdate)
This assumes that the data is stored as date values and not as strings (otherwise, the max will return the wrong value).
select id, checking from
Table
where checkout < CURRENT_DATE --Postgresql date of today, Oracle should have an equivalent now
and id not in (select id from Table where checkout >= CURRENT_DATE);
This should give you all the results of records that are before in time and that all with the same id are also before in time.
SELECT Ta.*
FROM TABLE Ta,
(SELECT MAX(checkout) checkout, ID FROM TABLE GROUP BY ID) Tb
WHERE Ta.ID = Tb.ID
AND sysdate >= Ta.checkout -- checks for date in current record
AND sysdate >= Tb.checkout -- checks for dates in all records