I would like to subtract dates of two consecutive rows, using ORACLE's LAG-function (ORACLE version 19g):
SELECT CLIENT, ID, GROUP_A, GROUP_B, GROUP_C, DATE_A, DATE_B
(DATE_A - LAG(DATE_B, 1) OVER (PARTITION BY GROUP_A, ID
ORDER BY ID ASC, GROUP_A ASC, GROUP_B ASC)) AS DELTA_TIME_IN_DAYS
FROM MY_TABLE
As you can see, the problem is, that there can be multiple entries like in line 3 and 4.
Of course column "DELTA_TIME_IN_DAYS" entry in line 4 shouldn't be negative.
It should be "1" as result.
Do you have any suggestions to solve this problem?
I think you want to order by date_b, if you want non-negative numbers:
SELECT CLIENT, ID, GROUP_A, GROUP_B, GROUP_C, DATE_A, DATE_B
(DATE_A - LAG(DATE_B, 1) OVER (PARTITION BY GROUP_A, ID
ORDER BY ID DATE_B ASC
)
) AS DELTA_TIME_IN_DAYS
FROM MY_TABLE;
Note that you do not need to repeat the partitioning keys in the ORDER BY.
You may use range between to offset by groups of rows with identical sort ordinal rather than by physical rows like lag do by default. So, let's use last_value with offset of 1 group to do the same that lag does.
with a (id, start_dt, end_dt) as (
select 1, date '2021-01-01', date '2021-01-31' from dual union all
select 2, date '2021-02-01', date '2021-02-27' from dual union all
select 3, date '2021-03-01', date '2021-03-25' from dual union all
select 3, date '2021-03-01', date '2021-03-25' from dual union all
select 4, date '2021-04-01', date '2021-05-31' from dual union all
select 4, date '2021-04-01', date '2021-05-31' from dual
)
select
a.*
, last_value(end_dt) over(order by id asc range between unbounded preceding and 1 preceding) as dt_diff
from a
D
START_DT
END_DT
DT_DIFF
1
2021-01-01T00:00:00Z
2021-01-31T00:00:00Z
(null)
2
2021-02-01T00:00:00Z
2021-02-27T00:00:00Z
2021-01-31T00:00:00Z
3
2021-03-01T00:00:00Z
2021-03-25T00:00:00Z
2021-02-27T00:00:00Z
3
2021-03-01T00:00:00Z
2021-03-25T00:00:00Z
2021-02-27T00:00:00Z
4
2021-04-01T00:00:00Z
2021-05-31T00:00:00Z
2021-03-25T00:00:00Z
4
2021-04-01T00:00:00Z
2021-05-31T00:00:00Z
2021-03-25T00:00:00Z
Related
I have a process that occur every 30 days but can take few days.
How can I differentiate between each iteration in order to sum the output of the process?
for Example
the output I except is
Name
Date
amount
iteration (optional)
Sophia Liu
2016-01-01
4
1
Sophia Liu
2016-02-01
5
2
Nikki Leith
2016-01-02
5
1
Nikki Leith
2016-02-01
10
2
I tried using lag function on the date filed and using the difference between that column and the date column.
WITH base AS
(SELECT 'Sophia Liu' as name, DATE '2016-01-01' as date, 3 as amount
UNION ALL SELECT 'Sophia Liu', DATE '2016-01-02', 1
UNION ALL SELECT 'Sophia Liu', DATE '2016-02-01', 3
UNION ALL SELECT 'Sophia Liu', DATE '2016-02-02', 2
UNION ALL SELECT 'Nikki Leith', DATE '2016-01-02', 5
UNION ALL SELECT 'Nikki Leith', DATE '2016-02-01', 5
UNION ALL SELECT 'Nikki Leith', DATE '2016-02-02', 3
UNION ALL SELECT 'Nikki Leith', DATE '2016-02-03', 1
UNION ALL SELECT 'Nikki Leith', DATE '2016-02-04', 1)
select
name
,date
,lag(date) over (partition by name order by date) as lag_func
,date_diff(date,lag(date) over (partition by name order by date),day) date_differacne
,case when date_diff(date,lag(date) over (partition by name order by date),day) >= 10
or date_diff(date,lag(date) over (partition by name order by date),day) is null then true else false end as new_iteration
,amount
from base
Edited answer
After your clarification and looking at what's actually in your SQL code. I'm guessing you are looking for a solution to what's called a gaps and islands problem. That is, you want to identify the "islands" of activity and sum the amount for each iteration or island. Taking your example you can first identify the start of a new session (or "gap") and then use that to create a unique iteration ("island") identifier for each user. You can then use that identifier to perform a SUM().
gaps as (
select
name,
date,
amount,
if(date_diff(date, lag(date,1) over(partition by name order by date), DAY) >= 10, 1, 0) new_iteration
from base
),
islands as (
select
*,
1 + sum(new_iteration) over(partition by name order by date) iteration_id
from gaps
)
select
*,
sum(amount) over(partition by name, iteration_id) iteration_amount
from islands
Previous answer
Sounds like you just need a RANK() to count the iterations in your window functions. Depending on your need you can then sum cumulative or total amounts in a similar window function. Something like this:
select
name
,date
,rank() over (partition by name order by date) as iteration
,sum(amount) over (partition by name order by date) as cumulative_amount
,sum(amount) over (partition by name) as total_amount
,amount
from base
Ex:
Date - up,
1/2 - 1.1.127.0 ,
1/3 - 1.1.127.1,
1/3 - 1.1.127.0,
1/4 - 1.1.127.3,
1/4 - 1.1.127.5,
1/5 - 1.1.127.3,
Output:
Date-count,
1/2 - 1,
1/3 - 1,
1/4 - 2,
1/5 -0
New and unique ip logged in in each day
You want to count how many IPs exist for a date that have not occurred on a previous date. You want to use analytic functions for this.
The number of new IDs is the total number of distinct IDs on a date minus the number of the previous date. In order to get this, first select the running count per row. Then aggregate per date to get the distinct number of IDs per date. Then use LAG to get the difference per day.
select
date,
max(cnt) - lag(max(cnt)) over (order by date) as new_ips
from
(
select date, count(distinct ip) over (order by date) as cnt
from mytable
) running_counts
group by date
order by date;
The same without analytic functions, which is probably more readable:
select date, count(distinct ip) as cnt
from mytable
where not exists
(
select null
from mytable before
where before.date < mytable.date
and before.id = mytable.id
)
group by date
order by date;
The DISTINCT in this latter query is not necessary, if there can be no duplicates (two rows with the same date and IP) in the table.
You can also use below solution using left join.
with t (dt, ip) as (
select to_date( '1/2', 'MM/DD' ), '1.1.127.0' from dual union all
select to_date( '1/3', 'MM/DD' ), '1.1.127.1' from dual union all
select to_date( '1/3', 'MM/DD' ), '1.1.127.0' from dual union all
select to_date( '1/4', 'MM/DD' ), '1.1.127.3' from dual union all
select to_date( '1/4', 'MM/DD' ), '1.1.127.5' from dual union all
select to_date( '1/5', 'MM/DD' ), '1.1.127.3' from dual
)
select t.DT, count( decode(t2.IP, null, 1, null) ) cnt
from t
left join t t2
on ( t2.DT < t.DT and t2.IP = t.IP )
group by t.DT
order by 1
;
demo
I am working on a "counting days" problem almost identical to this one. I have a list of date(s), and need to count how many days used excluding duplicate, and handling the gaps. Same input and output.
From: Markus Jarderot
Input
ID d1 d2
1 2011-08-01 2011-08-08
1 2011-08-02 2011-08-06
1 2011-08-03 2011-08-10
1 2011-08-12 2011-08-14
2 2011-08-01 2011-08-03
2 2011-08-02 2011-08-06
2 2011-08-05 2011-08-09
Output
ID hold_days
1 11
2 8
SQL to find time elapsed from multiple overlapping intervals
But for the life of me I couldn't understand Markus Jarderot's solution.
SELECT DISTINCT
t1.ID,
t1.d1 AS date,
-DATEDIFF(DAY, (SELECT MIN(d1) FROM Orders), t1.d1) AS n
FROM Orders t1
LEFT JOIN Orders t2 -- Join for any events occurring while this
ON t2.ID = t1.ID -- is starting. If this is a start point,
AND t2.d1 <> t1.d1 -- it won't match anything, which is what
AND t1.d1 BETWEEN t2.d1 AND t2.d2 -- we want.
GROUP BY t1.ID, t1.d1, t1.d2
HAVING COUNT(t2.ID) = 0
Why is DATEDIFF(DAY, (SELECT MIN(d1) FROM Orders), t1.d1) picking from the min(d1) from the entire list? Is that regardless of ID.
And what does t1.d1 BETWEEN t2.d1 AND t2.d2 do? Is that to ensure only overlapped interval are calculated?
Same thing with group by, I think because if in the event the same identical period will be discarded? I tried to trace the solution by hand but getting more confused.
This is mostly a duplicate of my answer here (including explanation) but with the inclusion of grouping on an id column. It should use a single table scan and does not require a recursive sub-query factoring clause (CTE) or self joins.
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE your_table ( id, usr, start_date, end_date ) AS
SELECT 1, 'A', DATE '2017-06-01', DATE '2017-06-03' FROM DUAL UNION ALL
SELECT 1, 'B', DATE '2017-06-02', DATE '2017-06-04' FROM DUAL UNION ALL -- Overlaps previous
SELECT 1, 'C', DATE '2017-06-06', DATE '2017-06-06' FROM DUAL UNION ALL
SELECT 1, 'D', DATE '2017-06-07', DATE '2017-06-07' FROM DUAL UNION ALL -- Adjacent to previous
SELECT 1, 'E', DATE '2017-06-11', DATE '2017-06-20' FROM DUAL UNION ALL
SELECT 1, 'F', DATE '2017-06-14', DATE '2017-06-15' FROM DUAL UNION ALL -- Within previous
SELECT 1, 'G', DATE '2017-06-22', DATE '2017-06-25' FROM DUAL UNION ALL
SELECT 1, 'H', DATE '2017-06-24', DATE '2017-06-28' FROM DUAL UNION ALL -- Overlaps previous and next
SELECT 1, 'I', DATE '2017-06-27', DATE '2017-06-30' FROM DUAL UNION ALL
SELECT 1, 'J', DATE '2017-06-27', DATE '2017-06-28' FROM DUAL UNION ALL -- Within H and I
SELECT 2, 'K', DATE '2011-08-01', DATE '2011-08-08' FROM DUAL UNION ALL -- Your data below
SELECT 2, 'L', DATE '2011-08-02', DATE '2011-08-06' FROM DUAL UNION ALL
SELECT 2, 'M', DATE '2011-08-03', DATE '2011-08-10' FROM DUAL UNION ALL
SELECT 2, 'N', DATE '2011-08-12', DATE '2011-08-14' FROM DUAL UNION ALL
SELECT 3, 'O', DATE '2011-08-01', DATE '2011-08-03' FROM DUAL UNION ALL
SELECT 3, 'P', DATE '2011-08-02', DATE '2011-08-06' FROM DUAL UNION ALL
SELECT 3, 'Q', DATE '2011-08-05', DATE '2011-08-09' FROM DUAL;
Query 1:
SELECT id,
SUM( days ) AS total_days
FROM (
SELECT id,
dt - LAG( dt ) OVER ( PARTITION BY id
ORDER BY dt ) + 1 AS days,
start_end
FROM (
SELECT id,
dt,
CASE SUM( value ) OVER ( PARTITION BY id
ORDER BY dt ASC, value DESC, ROWNUM ) * value
WHEN 1 THEN 'start'
WHEN 0 THEN 'end'
END AS start_end
FROM your_table
UNPIVOT ( dt FOR value IN ( start_date AS 1, end_date AS -1 ) )
)
WHERE start_end IS NOT NULL
)
WHERE start_end = 'end'
GROUP BY id
Results:
| ID | TOTAL_DAYS |
|----|------------|
| 1 | 25 |
| 2 | 13 |
| 3 | 9 |
The brute force method is to create all days (in a recursive query) and then count:
with dates(id, day, d2) as
(
select id, d1 as day, d2 from mytable
union all
select id, day + 1, d2 from dates where day < d2
)
select id, count(distinct day)
from dates
group by id
order by id;
Unfortunately there is a bug in some Oracle versions and recursive queries with dates don't work there. So try this code and see whether it works in your system. (I have Oracle 11.2 and the bug still exists there; so I guess you need Oracle 12c.)
I guess Markus' idea is to find all starting points that are not within other ranges and all ending points that aren't. Then just take the first starting point till the first ending point, then the next starting point till the next ending point, etc. As Markus isn't using a window function to number starting and ending points, he must find a more complicated way to achieve this. Here is the query with ROW_NUMBER. Maybe this gives you a start what to look for in Markus' query.
select startpoint.id, sum(endpoint.day - startpoint.day)
from
(
select id, d1 as day, row_number() over (partition by id order by d1) as rn
from mytable m1
where not exists
(
select *
from mytable m2
where m1.id = m2.id
and m1.d1 > m2.d1 and m1.d1 <= m2.d2
)
) startpoint
join
(
select id, d2 as day, row_number() over (partition by id order by d1) as rn
from mytable m1
where not exists
(
select *
from mytable m2
where m1.id = m2.id
and m1.d2 >= m2.d1 and m1.d2 < m2.d2
)
) endpoint on endpoint.id = startpoint.id and endpoint.rn = startpoint.rn
group by startpoint.id
order by startpoint.id;
If all your intervals start at different dates, consider them in ascending order by d1 counting how many days are from d1 to the next interval.
You can discard an interval of it is contained in another one.
The last interval won't have a follower.
This query should give you how many days each interval give
select a.id, a.d1,nvl(min(b.d1), a.d2) - a.d1
from orders a
left join orders b
on a.id = b.id and a.d1 < b.d1 and a.d2 between b.d1 and b.d2
group by a.id, a.d1
Then group by id and sum days
I am trying to get the dates of last status changes. Below is an example data table.
In brief I want to query the minimum DATE value of the latest STATUS (ordered by CHANGE_NO) for each PRODUCT_ID. Mentioned values are the ones filled with yellow.
So far, I could get only the latest dates for each product.
SELECT
*
FROM
(
SELECT
PRODUCT_ID, CHANGE_NO, STATUS, DATE
,MAX(CHANGE_NO) OVER(PARTITION BY PRODUCT_ID) MAX_CHANGE_NO
FROM TABLE
ORDER BY PRODUCT_ID, CHANGE_NO
)
WHERE MAX_CHANGE_NO = CHANGE_NO
Please kindly share the link if there is already a question/answer for a similar case; I've searched but couldn't find any.
Note: I am using Oracle SQL.
Thanks in advance.
Here's one way to do this with analytic functions (avoiding joins).
with
test_data ( product_id, change_no, status, dt ) as (
select 1, 1, 'A', date '2016-10-10' from dual union all
select 1, 2, 'B', date '2016-10-11' from dual union all
select 1, 3, 'C', date '2016-10-12' from dual union all
select 1, 4, 'D', date '2016-10-13' from dual union all
select 2, 1, 'Y', date '2016-02-02' from dual union all
select 2, 2, 'X', date '2016-02-03' from dual union all
select 2, 3, 'X', date '2016-02-04' from dual union all
select 3, 1, 'H', date '2016-06-20' from dual union all
select 3, 2, 'G', date '2016-06-21' from dual union all
select 3, 3, 'T', date '2016-06-22' from dual union all
select 3, 4, 'K', date '2016-06-23' from dual union all
select 3, 5, 'K', date '2016-06-24' from dual union all
select 3, 6, 'K', date '2016-06-25' from dual
)
-- End of test data (not part of the solution). SQL query begins below this line.
select product_id,
max(status) keep (dense_rank last order by change_no) as status,
max(dt) as dt
from (
select product_id, change_no, status, dt,
case when lead(status) over (partition by product_id
order by change_no desc)
= status then 0 else 1 end as flag
from test_data
)
where flag = 1
group by product_id
order by product_id -- if needed
;
Output
PRODUCT_ID STATUS DT
---------- ------ ----------
1 D 13/10/2016
2 X 03/02/2016
3 K 23/06/2016
SELECT * FROM (
SELECT PRODUCT_ID, CHANGE_NO, STATUS,DATE, MIN(DATE) OVER(PARTITION BY PRODUCT_ID,STATUS) as MIN_DATE_OF_LATEST_STATUS
FROM (SELECT PRODUCT_ID, CHANGE_NO, STATUS, DATE
,FIRST_VALUE(STATUS) OVER(PARTITION BY PRODUCT_ID ORDER BY CHANGE_NO DESC) LATEST_STATUS
FROM TABLE
) T
WHERE STATUS = LATEST_STATUS
) T
WHERE DATE = MIN_DATE_OF_LATEST_STATUS
Use the FIRST_VALUE window function to get the latest status for each product_id
Get the MIN date for those status rows
Finally get those rows where min_date = date
If change_no isn't needed in the final result, the query can be simplified to
SELECT PRODUCT_ID, STATUS, MIN(DATE) as MIN_DATE_OF_LATEST_STATUS
FROM (SELECT PRODUCT_ID, CHANGE_NO, STATUS, DATE
,FIRST_VALUE(STATUS) OVER(PARTITION BY PRODUCT_ID ORDER BY CHANGE_NO DESC) LATEST_STATUS
FROM TABLE
) T
WHERE STATUS = LATEST_STATUS
GROUP BY PRODUCT_ID, STATUS
Need Suggestion to make it dynamic On Dates.
Expected:
Date, Total Sellers, Sellers From Previous Date
Currently:
Data in table(active_seller_codes): date, seller_code
Queries:
-- Date Wise Sellers Count
select date,count(distinct seller_code) as Sellers_COunt
from active_seller_codes where date between '2016-12-15' AND '2016-12-15'
-- Sellers from previous Days
select date,count(distinct seller_code) as Last_Day_Seller
from active_seller_codes
where date between '2016-12-15' AND '2016-12-15'
and seller_code IN(
select seller_code from active_seller_codes
where date between '2016-12-14' AND '2016-12-14'
)
group by 1
Database Using: Vertica
Reading attentively, you seem to want one row in the report, with the data from the search date in the first two columns and the data of the day before the search date in the third and fourth column, like so:
sales_date|sellers_count|prev_date |prev_sellers_count
2016-12-15| 8|2016-12-14| 5
The solution could be something like this (without the first Common Table Expression, which, in my case, contains the data, but in your case, the data would be in your active_seller_codes table.
WITH
-- initial input
(sales_date,seller_code) AS (
SELECT DATE '2016-12-15',42
UNION ALL SELECT DATE '2016-12-15',43
UNION ALL SELECT DATE '2016-12-15',44
UNION ALL SELECT DATE '2016-12-15',45
UNION ALL SELECT DATE '2016-12-15',46
UNION ALL SELECT DATE '2016-12-15',47
UNION ALL SELECT DATE '2016-12-15',48
UNION ALL SELECT DATE '2016-12-15',49
UNION ALL SELECT DATE '2016-12-14',42
UNION ALL SELECT DATE '2016-12-14',44
UNION ALL SELECT DATE '2016-12-14',46
UNION ALL SELECT DATE '2016-12-14',48
UNION ALL SELECT DATE '2016-12-14',50
UNION ALL SELECT DATE '2016-12-13',42
UNION ALL SELECT DATE '2016-12-13',43
UNION ALL SELECT DATE '2016-12-13',44
UNION ALL SELECT DATE '2016-12-13',45
UNION ALL SELECT DATE '2016-12-13',46
UNION ALL SELECT DATE '2016-12-13',47
UNION ALL SELECT DATE '2016-12-13',48
UNION ALL SELECT DATE '2016-12-13',49
)
,
-- search argument this, in the real query, would come just after the WITH keyword
-- as the above would be the source table
search_dt(search_dt) AS (SELECT DATE '2016-12-15')
,
-- the two days we're interested in, de-duped
distinct_two_days AS (
SELECT DISTINCT
sales_date
, seller_code
FROM active_seller_codes
WHERE sales_date IN (
SELECT search_dt FROM search_dt -- the search date
UNION ALL SELECT search_dt - 1 FROM search_dt -- the day before
)
)
,
-- the two days we want one above the other,
-- with index for the final pivot
vertical AS (
SELECT
ROW_NUMBER() OVER (ORDER BY sales_date DESC) AS idx
, sales_date
, count(DISTINCT seller_code) AS seller_count
FROM distinct_two_days
GROUP BY 2
)
SELECT
MAX(CASE idx WHEN 1 THEN sales_date END) AS sales_date
, SUM(CASE idx WHEN 1 THEN seller_count END) AS sellers_count
, MAX(CASE idx WHEN 2 THEN sales_date END) AS prev_date
, SUM(CASE idx WHEN 2 THEN seller_count END) AS prev_sellers_count
FROM vertical
;
sales_date|sellers_count|prev_date |prev_sellers_count
2016-12-15| 8|2016-12-14| 5