Get data from last year - google-bigquery

Get data from last year - google-bigquery

I am trying to get data from 1 year ago from the given date in bigquery.
date(DATE_ADD(date(DATE_ADD(timestamp(New_date),(DATEDIFF(timestamp(New_date), CURRENT_DATE())-1), "DAY")), -354, "DAY"))
The query above works, but when I try to put it in a case statement I get null.
sum(case when date(New_day) = date(DATE_ADD(date(DATE_ADD(timestamp(New_date),(DATEDIFF(timestamp(New_date), CURRENT_DATE())-1), "DAY")), -354, "DAY")) then 'this is working' else null end) as lastyeardata
How do I get data for 1 year ago on the same day to do a YoY comparison?
Edit #Mikhail's suggestions:
#standardSQl
WITH `yourTable` AS (
SELECT 1 AS id, '2017-08-14' AS New_date, '1234' AS volume UNION ALL
SELECT 2 AS id, '2017-08-13' AS New_date, '2345' AS volume UNION ALL
SELECT 3 AS id, '2017-08-14' AS New_date, '3456' AS volume UNION ALL
SELECT 4 AS id, '2017-08-14' AS New_date, '4567' AS volume UNION ALL
SELECT 5 AS id, '2016-08-14' AS New_date, '5678' AS volume UNION ALL
SELECT 6 AS id, '2016-08-13' AS New_date, '6789' AS volume UNION ALL
SELECT 7 AS id, '2016-08-12' AS New_date, '6789' AS volume UNION ALL
SELECT 8 AS id, '2016-08-11' AS New_date, '1011' AS volume
)
select
New_date
,volume as thisyeardata
,Case
WHEN PARSE_DATE('%Y-%m-%d', New_date) = DATE_ADD(PARSE_DATE('%Y-%m-%d', New_date), INTERVAL -1 YEAR)
THEN volume
ELSE NULL end as lastyearvolume
,DATE_ADD(PARSE_DATE('%Y-%m-%d', New_date), INTERVAL -1 YEAR) as lastyear
FROM `yourTable`
For some reason lastyearvolume is giving me null.

Below is for BigQuery Standard SQL (which is really recommended by BigQuery team to use versus Legacy one)
#standardSQl
WITH `yourTable` AS (
SELECT 1 AS id, '2017-08-14' AS New_date, '2016-08-14' AS New_day UNION ALL
SELECT 2 AS id, '2017-08-13' AS New_date, '2016-08-13' AS New_day UNION ALL
SELECT 3 AS id, '2017-08-14' AS New_date, '2016-08-12' AS New_day UNION ALL
SELECT 4 AS id, '2017-08-14' AS New_date, '2016-08-11' AS New_day
)
SELECT
COUNT(
CASE
WHEN PARSE_DATE('%Y-%m-%d', New_day) = DATE_ADD(PARSE_DATE('%Y-%m-%d', New_date), INTERVAL -1 YEAR)
THEN 'this is working'
ELSE NULL
END
) AS lastyeardata
FROM `yourTable`
As you would expect two rows are just counted as they have New_date and New_day year apart, the rest are not
Above example assumes your dates fields are of STRING type
If they are of Date Type - you just omit PARSE_DATE function
DATE_ADD has counterpart - DATE_SUB - so it can be used instead as DATE_SUB(PARSE_DATE('%Y-%m-%d', New_date), INTERVAL 1 YEAR)
Update based on your updated example:
simple and fast way to adjust your query t work:
#standardSQl
WITH `yourTable` AS (
SELECT 1 AS id, '2017-08-13' AS New_date, '1234' AS volume UNION ALL
SELECT 2 AS id, '2017-08-14' AS New_date, '2345' AS volume UNION ALL
SELECT 3 AS id, '2017-08-15' AS New_date, '3456' AS volume UNION ALL
SELECT 4 AS id, '2017-08-16' AS New_date, '4567' AS volume UNION ALL
SELECT 5 AS id, '2016-08-13' AS New_date, '5678' AS volume UNION ALL
SELECT 6 AS id, '2016-08-14' AS New_date, '6789' AS volume UNION ALL
SELECT 7 AS id, '2016-08-15' AS New_date, '6789' AS volume UNION ALL
SELECT 8 AS id, '2016-08-16' AS New_date, '1011' AS volume
)
SELECT
this_year.New_date
,this_year.volume AS thisyeardata
,last_year.volume AS lastyeardata
FROM `yourTable` AS this_year
JOIN `yourTable` AS last_year
ON PARSE_DATE('%Y-%m-%d', last_year.New_date) = DATE_ADD(PARSE_DATE('%Y-%m-%d', this_year.New_date), INTERVAL -1 YEAR)

Related

How to repeat same value based on month date in Bigquery

anyone can help me doing this in bigquery? So i have 2 table like this
01/01/2000
01/02/2000
01/03/2000
01/04/2000
and this
start | end | status
01/01/2000 | 01/02/2000 | a
01/02/2000 | 01/06/2000 | b
i want them become like this
month | status
01/01/2000 | a
01/02/2000 | b
01/03/2000 | b
01/04/2000 | b

You can handle this with a between in the join. The only caveat is how to handle the overlap of dates. In this case I've subtracted a day from the end to not include it in the range.
with temp1 as(
select '01/01/2000' dt UNION ALL
select '01/02/2000' UNION ALL
select '01/03/2000' UNION ALL
select '01/04/2000'
),
temp2 as (
select '01/01/2000' start, '01/02/2000' end_dt, 'a' status UNION ALL
select '01/02/2000' start, '01/06/2000' end_dt, 'b' status
)
select *
from temp1
join temp2
on parse_date('%d/%m/%Y',temp1.dt) between parse_date('%d/%m/%Y',temp2.start) and date_add(parse_date('%d/%m/%Y', temp2.end_dt), interval -1 day)

Below is for BigQuery Standard SQL
#standardSQL
select month, status
from `project.dataset.tableA`
join `project.dataset.tableB`
on parse_date('%d/%m/%Y', month) >= parse_date('%d/%m/%Y', start)
and parse_date('%d/%m/%Y', month) < parse_date('%d/%m/%Y', `end`)
If to apply to sample data in your question as in below example
#standardSQL
with `project.dataset.tableA` as (
select '01/01/2000' month union all
select '01/02/2000' union all
select '01/03/2000' union all
select '01/04/2000'
), `project.dataset.tableB` as (
select '01/01/2000' start, '01/02/2000' `end`, 'a' status union all
select '01/02/2000', '01/06/2000', 'b'
)
select month, status
from `project.dataset.tableA`
join `project.dataset.tableB`
on parse_date('%d/%m/%Y', month) >= parse_date('%d/%m/%Y', start)
and parse_date('%d/%m/%Y', month) < parse_date('%d/%m/%Y', `end`)
output is
And btw, since mid October 2020 - BigQuery standard SQL supports DATE arithmetic operators. So, below will also work
#standardSQL
select month, status
from `project.dataset.tableA`
join `project.dataset.tableB`
on parse_date('%d/%m/%Y', month)
between parse_date('%d/%m/%Y', start)
and parse_date('%d/%m/%Y', `end`) - 1

Identify contiguous and discontinuous date ranges

I have a table named x . The data is as follows.
Acccount_num start_dt end_dt
A111326 02/01/2016 02/11/2016
A111326 02/12/2016 03/05/2016
A111326 03/02/2016 03/16/2016
A111331 02/28/2016 02/29/2016
A111331 02/29/2016 03/29/2016
A999999 08/25/2015 08/25/2015
A999999 12/19/2015 12/22/2015
A222222 11/06/2015 11/10/2015
A222222 05/16/2016 05/17/2016
Both A111326 and A111331 should be identified as contiguous data and A999999 and
A222222 should be identified as discontinuous data.In my code I currently use the following query to identify discontinuous data. The A111326 is also erroneously identified as discontinuous data. Please help to modify the below code so that A111326 is not identified as discontinuous data.Thanks in advance for your help.
(SELECT account_num
FROM (SELECT account_num,
(MAX (
END_DT)
OVER (PARTITION BY account_num
ORDER BY START_DT))
START_DT,
(LEAD (
START_DT)
OVER (PARTITION BY account_num
ORDER BY START_DT))
END_DT
FROM x
WHERE (START_DT + 1) <=
(END_DT - 1))
WHERE START_DT < END_DT);

Oracle Setup:
CREATE TABLE accounts ( Account_num, start_dt, end_dt ) AS
SELECT 'A', DATE '2016-02-01', DATE '2016-02-11' FROM DUAL UNION ALL
SELECT 'A', DATE '2016-02-12', DATE '2016-03-05' FROM DUAL UNION ALL
SELECT 'A', DATE '2016-03-02', DATE '2016-03-16' FROM DUAL UNION ALL
SELECT 'B', DATE '2016-02-28', DATE '2016-02-29' FROM DUAL UNION ALL
SELECT 'B', DATE '2016-02-29', DATE '2016-03-29' FROM DUAL UNION ALL
SELECT 'C', DATE '2015-08-25', DATE '2015-08-25' FROM DUAL UNION ALL
SELECT 'C', DATE '2015-12-19', DATE '2015-12-22' FROM DUAL UNION ALL
SELECT 'D', DATE '2015-11-06', DATE '2015-11-10' FROM DUAL UNION ALL
SELECT 'D', DATE '2016-05-16', DATE '2016-05-17' FROM DUAL UNION ALL
SELECT 'E', DATE '2016-01-01', DATE '2016-01-02' FROM DUAL UNION ALL
SELECT 'E', DATE '2016-01-05', DATE '2016-01-06' FROM DUAL UNION ALL
SELECT 'E', DATE '2016-01-03', DATE '2016-01-07' FROM DUAL;
Query:
WITH times ( account_num, dt, lvl ) AS (
SELECT Account_num, start_dt - 1, 1 FROM accounts
UNION ALL
SELECT Account_num, end_dt, -1 FROM accounts
)
, totals ( account_num, dt, total ) AS (
SELECT account_num,
dt,
SUM( lvl ) OVER ( PARTITION BY Account_num ORDER BY dt, lvl DESC )
FROM times
)
SELECT Account_num,
CASE WHEN COUNT( CASE total WHEN 0 THEN 1 END ) > 1
THEN 'N'
ELSE 'Y'
END AS is_contiguous
FROM totals
GROUP BY Account_Num
ORDER BY Account_Num;
Output:
ACCOUNT_NUM IS_CONTIGUOUS
----------- -------------
A Y
B Y
C N
D N
E Y
Alternative Query:
(It's exactly the same method just using UNPIVOT rather than UNION ALL.)
SELECT Account_num,
CASE WHEN COUNT( CASE total WHEN 0 THEN 1 END ) > 1
THEN 'N'
ELSE 'Y'
END AS is_contiguous
FROM (
SELECT Account_num,
SUM( lvl ) OVER ( PARTITION BY Account_Num
ORDER BY CASE lvl WHEN 1 THEN dt - 1 ELSE dt END,
lvl DESC
) AS total
FROM accounts
UNPIVOT ( dt FOR lvl IN ( start_dt AS 1, end_dt AS -1 ) )
)
GROUP BY Account_Num
ORDER BY Account_Num;

WITH cte AS (
SELECT
AccountNumber
,CASE
WHEN
LAG(End_Dt) OVER (PARTITION BY AccountNumber ORDER BY End_Dt) IS NULL THEN 0
WHEN
LAG(End_Dt) OVER (PARTITION BY AccountNumber ORDER BY End_Dt) >= Start_Dt - 1 THEN 0
ELSE 1
END as discontiguous
FROM
#Table
)
SELECT
AccountNumber
,CASE WHEN SUM(discontiguous) > 0 THEN 'discontiguous' ELSE 'contiguous' END
FROM
cte
GROUP BY
AccountNumber;
One of your problems is that your contiguous desired result also includes overlapping date ranges in your example data set. Example A111326 Starts on 3/2/2016 but ends the row before on 3/5/2015 meaning it overlaps by 3 days.

How to fetch records that have an alternate entry

I need some help to fetch records having alternate set of entries associated with Unique value(ex: user_id)
I want output to be only (1111,2222,3333)
Here is the scenario:
user_id 1111 attended .net course from 2005-01-01 to 2006-12-31
he later attended java from 2007-01-01 to 2009-12-31
he later came back to .net
so i want to retrieve these kind of user_id's
user_id 4444 should not be in the output, because there is no alternative courses.
UPDATE: 4444 started his Java course from 2007 to 2009 he again
attended Java from 2010 - 2012 Later he attended .net but never came
back to Java so he must be excluded from output
If Group by is used, it will consider records irrespective of alternate course name.
We can create a procedure to accomplish this by looping and comparing the alternate course name but i want to know if a query can do this?

You can use two INNER JOIN operations:
SELECT DISTINCT user_id
FROM mytable AS t1
INNER JOIN mytable AS t2
ON t1.user_id = t2.user_id AND t1.id < t2.id AND t1.course_name <> t2.course_name
INNER JOIN mytable AS t3
ON t2.user_id = t3.user_id AND t2.id < t3.id AND t1.course_name = t3.course_name
I assume that id is an auto-increment field that reflects the order the rows have been inserted in the DB. Otherwise, you should use a date field in its place.

Same as Girogos Betsos' answer, only with select distinct to prevent duplicates.
SELECT DISTINCT user_id
FROM mytable AS t1
INNER JOIN mytable AS t2
ON t1.user_id = t2.user_id AND t1.Start_Date < t2.Start_Date AND
t1.course_name <> t2.course_name
INNER JOIN mytable AS t3
ON t2.user_id = t3.user_id AND t2.Start_Date < t3.Start_Date AND
t1.course_name = t3.course_name
EDIT: Using Start_Date since the answer has been updated and IDs are not necessarily sequential.

This is a version utilizing Windowed Aggregate Fuctions instead of multiple self joins:
SELECT DISTINCT user_id
FROM
(
SELECT user_id
,course_name
,start_date
,RANK() -- number all courses
OVER (PARTITION BY user_id
ORDER BY start_date)
-
RANK() -- number each course
OVER (PARTITION BY user_id, course_name
ORDER BY start_date) AS x
FROM tab
) dt
GROUP BY user_id, course_name
HAVING MIN(x) <> MAX(x) -- same course but another inbetween
If a user has a course multiple times in a series that x will stay the same, if there was another course inbetween it will change:
java 1 - 1 = 0
java 2 - 2 = 0 <--- min
.net 3 - 1 = 2
java 4 - 3 = 1 <--- max
java 1 - 1 = 0
java 2 - 2 = 0
.net 3 - 1 = 2
.net 4 - 2 = 2

Using a single table scan and does not rely on GROUP BY:
WITH table_name ( user_id, start_date, end_date, course_name, id ) AS (
SELECT 1111, DATE '2005-01-01', DATE '2006-12-31', '.net', 1 FROM DUAL UNION ALL
SELECT 1111, DATE '2007-01-01', DATE '2009-12-31', 'java', 2 FROM DUAL UNION ALL
SELECT 1111, DATE '2010-01-01', DATE '2020-12-31', '.net', 3 FROM DUAL UNION ALL
SELECT 2222, DATE '2005-01-01', DATE '2006-12-31', 'java', 4 FROM DUAL UNION ALL
SELECT 2222, DATE '2007-01-01', DATE '2008-12-31', '.net', 5 FROM DUAL UNION ALL
SELECT 2222, DATE '2009-01-01', DATE '2012-12-31', '.net', 6 FROM DUAL UNION ALL
SELECT 2222, DATE '2013-01-01', DATE '2016-12-31', 'java', 7 FROM DUAL UNION ALL
SELECT 3333, DATE '2005-01-01', DATE '2007-12-31', 'java', 8 FROM DUAL UNION ALL
SELECT 3333, DATE '2007-01-01', DATE '2008-12-31', '.net', 9 FROM DUAL UNION ALL
SELECT 3333, DATE '2009-01-01', DATE '2013-12-31', 'java', 10 FROM DUAL UNION ALL
SELECT 3333, DATE '2014-01-01', DATE '2016-12-31', '.net', 11 FROM DUAL UNION ALL
SELECT 4444, DATE '2007-01-01', DATE '2009-12-31', 'java', 12 FROM DUAL UNION ALL
SELECT 4444, DATE '2010-01-01', DATE '2012-12-31', 'java', 13 FROM DUAL UNION ALL
SELECT 4444, DATE '2013-01-01', DATE '2015-12-31', '.net', 14 FROM DUAL UNION ALL
SELECT 4444, DATE '2016-01-01', DATE '2016-12-31', '.net', 15 FROM DUAL
)
SELECT DISTINCT user_id
FROM (
SELECT user_id,
LEAD( course_name )
OVER ( PARTITION BY user_id, course_name ORDER BY start_date )
AS next_same_course,
LEAD( course_name )
OVER ( PARTITION BY user_id ORDER BY start_date )
AS next_course
FROM table_name
)
WHERE next_same_course IS NOT NULL
AND next_course <> next_same_course;

How do I write an SQL to get a cumulative value and a monthly total in one row?

Say, I have the following data:
select 1 id, date '2007-01-16' date_created, 5 sales, 'Bob' name from dual union all
select 2 id, date '2007-04-16' date_created, 2 sales, 'Bob' name from dual union all
select 3 id, date '2007-05-16' date_created, 6 sales, 'Bob' name from dual union all
select 4 id, date '2007-05-21' date_created, 4 sales, 'Bob' name from dual union all
select 5 id, date '2013-07-16' date_created, 24 sales, 'Bob' name from dual union all
select 6 id, date '2007-01-17' date_created, 15 sales, 'Ann' name from dual union all
select 7 id, date '2007-04-17' date_created, 12 sales, 'Ann' name from dual union all
select 8 id, date '2007-05-17' date_created, 16 sales, 'Ann' name from dual union all
select 9 id, date '2007-05-22' date_created, 14 sales, 'Ann' name from dual union all
select 10 id, date '2013-07-17' date_created, 34 sales, 'Ann' name from dual
I want to get results like the following:
Name Total_cumulative_sales Total_sales_current_month
Bob 41 24
Ann 91 34
In this table, for Bob, his total sales is 41 starting from the beginning. And for this month which is July, his sales for this entire month is 24. Same goes for Ann.
How do I write an SQL to get this result?

Try this way:
select name, sum(sales) as Total_cumulative_sales ,
sum(
case trunc(to_date(date_created), 'MM')
when trunc(sysdate, 'MM') then sales
else 0
end
) as Total_sales_current_month
from tab
group by name
SQL Fiddle Demo
More information
Trunc
Case Statement

SELECT Name,
SUM(Sales) Total_sales,
SUM(CASE WHEN MONTH(date_created) = MONTH(GetDate()) AND YEAR(date_created) = YEAR(GetDate()) THEN Sales END) Total_sales_current_month
GROUP BY Name
Should work, but there's probably a more elegant way to specify "in the current month".

This should work for sales over a number of years. It will get the cumulative sales over any number of years. It won't produce a record if there are no sales in the latest month.
WITH sales AS
(select 1 id, date '2007-01-16' date_created, 5 sales, 'Bob' sales_name from dual union all
select 2 id, date '2007-04-16' date_created, 2 sales, 'Bob' sales_name from dual union all
select 3 id, date '2007-05-16' date_created, 6 sales, 'Bob' sales_name from dual union all
select 4 id, date '2007-05-21' date_created, 4 sales, 'Bob' sales_name from dual union all
select 5 id, date '2013-07-16' date_created, 24 sales, 'Bob' sales_name from dual union all
select 6 id, date '2007-01-17' date_created, 15 sales, 'Ann' sales_name from dual union all
select 7 id, date '2007-04-17' date_created, 12 sales, 'Ann' sales_name from dual union all
select 8 id, date '2007-05-17' date_created, 16 sales, 'Ann' sales_name from dual union all
select 9 id, date '2007-05-22' date_created, 14 sales, 'Ann' sales_name from dual union all
select 10 id, date '2013-07-17' date_created, 34 sales, 'Ann' sales_name from dual)
SELECT sales_name
,total_sales
,monthly_sales
,mon
FROM (SELECT sales_name
,SUM(sales) OVER (PARTITION BY sales_name ORDER BY mon) total_sales
,SUM(sales) OVER (PARTITION BY sales_name,mon ORDER BY mon) monthly_sales
,mon
,max_mon
FROM ( SELECT sales_name
,sum(sales) sales
,mon
,max_mon
FROM (SELECT sales_name
,to_number(to_char(date_created,'YYYYMM')) mon
,sales
,MAX(to_number(to_char(date_created,'YYYYMM'))) OVER (PARTITION BY sales_name) max_mon
FROM sales
ORDER BY 2)
GROUP BY sales_name
,max_mon
,mon
)
)
WHERE max_mon = mon
;

SQL - Get Min, Max date for a given group with break in dates

I'm trying to find min and max process date for following data for a given value with break in process date (note that rows are not processed on weekends, i don't want to break them into two different sets if they have same value)
SELECT 1, 'A',to_date('10/01/2012','dd/mm/yyyy'), 10, to_date('11/01/2012','dd/mm/yyyy') FROm DUAL
UNION ALL SELECT 1, 'A',to_date('11/01/2012','dd/mm/yyyy'), 10, to_date('12/01/2012','dd/mm/yyyy') FROm DUAL
UNION ALL SELECT 1, 'A',to_date('12/01/2012','dd/mm/yyyy'), 9, to_date('13/01/2012','dd/mm/yyyy') FROm DUAL
UNION ALL SELECT 1, 'A',to_date('13/01/2012','dd/mm/yyyy'), 9, to_date('14/01/2012','dd/mm/yyyy') FROm DUAL
UNION ALL SELECT 1, 'A',to_date('16/01/2012','dd/mm/yyyy'), 9, to_date('17/01/2012','dd/mm/yyyy') FROm DUAL
UNION ALL SELECT 1, 'A',to_date('17/01/2012','dd/mm/yyyy'), 10, to_date('18/01/2012','dd/mm/yyyy') FROm DUAL
UNION ALL SELECT 1, 'A',to_date('18/01/2012','dd/mm/yyyy'), 10, to_date('19/01/2012','dd/mm/yyyy') FROm DUAL;
My attempt (which i know is wrong)
SELECT id, cd, value, min(p_dt) min_dt, max(p_dt) max_dt FROM T
group by id, cd, value;
This returns
ID CD VALUE MIN_DT MAX_DT
----------------------------------------------------------------------------------
1 A 9 January, 12 2012 00:00:00+0000 January, 16 2012 00:00:00+0000
1 A 10 January, 10 2012 00:00:00+0000 January, 18 2012 00:00:00+0000
What i want to return is
ID CD VALUE MIN_DT MAX_DT
----------------------------------------------------------------------------------
1 A 9 January, 12 2012 00:00:00+0000 January, 16 2012 00:00:00+0000
1 A 10 January, 10 2012 00:00:00+0000 January, 11 2012 00:00:00+0000
1 A 10 January, 17 2012 00:00:00+0000 January, 18 2012 00:00:00+0000
I tried different ways to query this but i couldn't come with a working query.
SQL FIDDLE

Not sure what you want... You do not have correct data to partition by dates. Your dates are unique, unless you meant that your i_dt must be equal p_dt. Even if you partition by dates instead of values you will get all rows in return as in simple select.
In my example I partition by value. There could be only one max and one min date within unique value. Examine the output:
SELECT id, cd, i_dt, p_dt, value
, To_Char(MIN(p_dt) OVER (PARTITION BY value), 'Mon, DD YYYY HH24:MI:SS') min_dt
, To_Char(MAX(p_dt) OVER (PARTITION BY value), 'Mon, DD YYYY HH24:MI:SS') max_dt
FROM t
/
ID CD I_DT P_DT VALUE MIN_DT MAX_DT
---------------------------------------------------------------------------------------
1 A 1/14/2012 1/13/2012 9 Jan, 12 2012 00:00:00 Jan, 16 2012 00:00:00
1 A 1/17/2012 1/16/2012 9 Jan, 12 2012 00:00:00 Jan, 16 2012 00:00:00
1 A 1/13/2012 1/12/2012 9 Jan, 12 2012 00:00:00 Jan, 16 2012 00:00:00
1 A 1/19/2012 1/18/2012 10 Jan, 10 2012 00:00:00 Jan, 18 2012 00:00:00
1 A 1/18/2012 1/17/2012 10 Jan, 10 2012 00:00:00 Jan, 18 2012 00:00:00
1 A 1/12/2012 1/11/2012 10 Jan, 10 2012 00:00:00 Jan, 18 2012 00:00:00
1 A 1/11/2012 1/10/2012 10 Jan, 10 2012 00:00:00 Jan, 18 2012 00:00:00

The are a number of other questions on this site looking to solve the same problem. Examples are here and here, and those are just questions that I have provided answers for.
This question is a little more complicated because of the requirement to ignore weekends. The seems to be relatively simple to solve as I will explain soon.
You question doesn't include column names for all of the columns within your table. I have assumed that the first date is the process date and the other date is not important for this query. This might be the wrong assumption.
From the question, it looks like a group will exist if, for a weekday (Mon-Thurs), there is a matching row on the next day. For a Friday, there needs to be a matching row on the following Monday. I handle this by adding 3 days if it is a Friday or one day in every other case.
An example query is shown below and a SQLFiddle is also available.
Hopefully this solves your problem.
with test_data as (
SELECT 1 as id, 'A' as cd,to_date('10/01/2012','dd/mm/yyyy') as p_date, 10 as value, to_date('11/01/2012','dd/mm/yyyy') as some_other_date FROm DUAL UNION ALL
SELECT 1 as id, 'A' as cd,to_date('11/01/2012','dd/mm/yyyy') as p_date, 10 as value, to_date('12/01/2012','dd/mm/yyyy') as some_other_date FROm DUAL UNION ALL
SELECT 1 as id, 'A' as cd,to_date('12/01/2012','dd/mm/yyyy') as p_date, 9 as value, to_date('13/01/2012','dd/mm/yyyy') as some_other_date FROm DUAL UNION ALL
SELECT 1 as id, 'A' as cd,to_date('13/01/2012','dd/mm/yyyy') as p_date, 9 as value, to_date('14/01/2012','dd/mm/yyyy') as some_other_date FROm DUAL UNION ALL
SELECT 1 as id, 'A' as cd,to_date('16/01/2012','dd/mm/yyyy') as p_date, 9 as value, to_date('17/01/2012','dd/mm/yyyy') as some_other_date FROm DUAL UNION ALL
SELECT 1 as id, 'A' as cd,to_date('17/01/2012','dd/mm/yyyy') as p_date, 10 as value, to_date('18/01/2012','dd/mm/yyyy') as some_other_date FROm DUAL UNION ALL
SELECT 1 as id, 'A' as cd,to_date('18/01/2012','dd/mm/yyyy') as p_date, 10 as value, to_date('19/01/2012','dd/mm/yyyy') as some_other_date FROm DUAL
)
select
id,
cd,
value,
block_num,
min(p_date) as process_start_date,
max(p_date) as process_end_date
from (
select
id,
cd,
value,
p_date,
sum(is_block_start) over (partition by id, cd, value order by p_date) as block_num
from (
select
id,
cd,
value,
p_date,
-- get end date of previous block
case when lag(case when to_char(p_date, 'DY') = 'FRI' then p_date+3 else p_date+1 end)
over (partition by id, cd, value order by p_date) = p_date then 0 else 1 end as is_block_start
from test_data
-- Make sure that the data definitely doesn't include Sat or Sun because this could just confuse things
where to_char(p_date, 'DY') not in ('SAT', 'SUN')
)
)
group by id, cd, value, block_num
order by id, cd, value, block_num

Here is the answer using analytic functions.
With your sample data...
WITH
tbl (ID, CD, P_DATE, A_VALUE, I_DATE) AS
(
SELECT 1, 'A',to_date('10/01/2012','dd/mm/yyyy'), 10, to_date('11/01/2012','dd/mm/yyyy') FROm DUAL UNION ALL
SELECT 1, 'A',to_date('11/01/2012','dd/mm/yyyy'), 10, to_date('12/01/2012','dd/mm/yyyy') FROm DUAL UNION ALL
SELECT 1, 'A',to_date('12/01/2012','dd/mm/yyyy'), 9, to_date('13/01/2012','dd/mm/yyyy') FROm DUAL UNION ALL
SELECT 1, 'A',to_date('13/01/2012','dd/mm/yyyy'), 9, to_date('14/01/2012','dd/mm/yyyy') FROm DUAL UNION ALL
SELECT 1, 'A',to_date('16/01/2012','dd/mm/yyyy'), 9, to_date('17/01/2012','dd/mm/yyyy') FROm DUAL UNION ALL
SELECT 1, 'A',to_date('17/01/2012','dd/mm/yyyy'), 10, to_date('18/01/2012','dd/mm/yyyy') FROm DUAL UNION ALL
SELECT 1, 'A',to_date('18/01/2012','dd/mm/yyyy'), 10, to_date('19/01/2012','dd/mm/yyyy') FROm DUAL
),
... create cte (grid) with columns (PREV_DAY_DIFF and NEXT_DAY_DIFF) to handle continuity (taking care of weekends) and to, later, help grouping the rows based on continuity...
grid AS
( SELECT ID, CD, A_VALUE, P_DATE, To_Char(P_DATE, 'DY') "P_DAY", I_DATE,
--
CASE WHEN To_Char(LAG(P_DATE, 1) OVER(Partition By ID, CD, A_VALUE Order By ID, CD, A_VALUE, P_DATE), 'DY') = 'FRI' THEN 1
WHEN To_Char(LAG(P_DATE, 1) OVER(Partition By ID, CD, A_VALUE Order By ID, CD, A_VALUE, P_DATE), 'DY') Is Null THEN 0
ELSE P_DATE - LAG(P_DATE, 1) OVER(Partition By ID, CD, A_VALUE Order By ID, CD, A_VALUE, P_DATE)
END "PREV_DAY_DIFF",
--
CASE WHEN To_Char(LEAD(P_DATE, 1) OVER(Partition By ID, CD, A_VALUE Order By ID, CD, A_VALUE, P_DATE), 'DY') = 'MON' THEN 1
WHEN To_Char(LEAD(P_DATE, 1) OVER(Partition By ID, CD, A_VALUE Order By ID, CD, A_VALUE, P_DATE), 'DY') Is Null THEN 0
ELSE LEAD(P_DATE, 1) OVER(Partition By ID, CD, A_VALUE Order By ID, CD, A_VALUE, P_DATE) - P_DATE
END "NEXT_DAY_DIFF"
FROM tbl
ORDER BY ID, CD, A_VALUE, P_DATE
)
Main SQL - takes cte data (the inner join query) and joins them with your sample data calculating and selecting distinct groups with min and max dates as asked
SELECT DISTINCT
t.ID, t.CD, t.A_VALUE,
Nvl(g.MIN_P_DATE, LAG(g.MIN_P_DATE) OVER(Partition By t.ID, t.CD, t.A_VALUE Order By t.ID, t.CD, t.A_VALUE, t.P_DATE)) "MIN_P_DATE",
Nvl(g.MAX_P_DATE, LEAD(g.MAX_P_DATE) OVER(Partition By t.ID, t.CD, t.A_VALUE Order By t.ID, t.CD, t.A_VALUE, t.P_DATE)) "MAX_P_DATE"
FROM tbl t
INNER JOIN
( SELECT ID, CD, A_VALUE, NEXT_DAY_DIFF, PREV_DAY_DIFF,
MIN( CASE WHEN (PREV_DAY_DIFF > 1 And NEXT_DAY_DIFF = 1) THEN P_DATE
WHEN (PREV_DAY_DIFF = 0 And NEXT_DAY_DIFF = 1) THEN P_DATE
END ) OVER( Partition By ID, CD, A_VALUE, PREV_DAY_DIFF Order By ID, CD, A_VALUE, P_DATE ) "MIN_P_DATE",
MAX( CASE WHEN (NEXT_DAY_DIFF > 1 And PREV_DAY_DIFF = 1) THEN P_DATE
WHEN (NEXT_DAY_DIFF = 0 And PREV_DAY_DIFF = 1) THEN P_DATE
WHEN (PREV_DAY_DIFF > 1 And NEXT_DAY_DIFF = 1) THEN P_DATE + 1
END ) OVER( Partition By ID, CD, A_VALUE, NEXT_DAY_DIFF Order By ID, CD, A_VALUE, P_DATE ) "MAX_P_DATE"
FROM grid
WHERE NEXT_DAY_DIFF - PREV_DAY_DIFF != 0
) g ON (t.ID = g.ID And t.CD = g.CD And t.A_VALUE = g.A_VALUE And t.P_DATE = g.MIN_P_DATE OR t.P_DATE = g.MAX_P_DATE)
ORDER BY t.ID, t.CD, t.A_VALUE
This results as:
ID
CD
A_VALUE
MIN_P_DATE
MAX_P_DATE
1
A
9
12-JAN-12
16-JAN-12
1
A
10
10-JAN-12
11-JAN-12
1
A
10
17-JAN-12
18-JAN-12

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Get data from last year - google-bigquery

Related

How to repeat same value based on month date in Bigquery

Identify contiguous and discontinuous date ranges

How to fetch records that have an alternate entry

How do I write an SQL to get a cumulative value and a monthly total in one row?

SQL - Get Min, Max date for a given group with break in dates

Categories

Resources