I am trying to find how many events occur by year. Currently I have this query, that basically counts when an event has visitors:
SELECT
count(visitors_y_2016) as y_16,
count(visitors_y_2017) as y_17,
count(visitors_y_2018) as y_18,
count(visitors_y_2019) as y_19,
count(visitors_y_2020) as y_20
FROM event
;
y16 y17 y18 y19 y20
23 25 26 27 19
But what I am looking for is an order by the year with more events:
Y19 27
Y18 26
y17 25
y16 23
y20 19
Any idea how to accomplish that?
your table design looks quite strange, as such information should be in rows and not columns.
But you can UNION all results and then sort them
CREATE TABLE event (visitors_y_2016 int,visitors_y_2017 int,visitors_y_2018 int,visitors_y_2019 i
(SELECT
'y_16' ,count(visitors_y_2016) as cnt
FROM event
UNION ALL
SELECT
'y_17',count(visitors_y_2017)
FROM event
UNION ALL
SELECT
'y_18',
count(visitors_y_2018)
FROM event
UNION ALL
SELECT
'y_19',
count(visitors_y_2019)
FROM event
UNION ALL
SELECT
'y_20',
count(visitors_y_2020)
FROM event)
ORDER BY cnt
;
?column? | cnt
:------- | --:
y_16 | 0
y_17 | 0
y_18 | 0
y_19 | 0
y_20 | 0
db<>fiddle here
You can "unpivot" with a VALUES expression in a LATERAL subquery:
SELECT t.*
FROM (
SELECT count(visitors_y_2016) AS y16
, count(visitors_y_2017) AS y17
, count(visitors_y_2018) AS y18
, count(visitors_y_2019) AS y19
, count(visitors_y_2020) AS y20
FROM event
) e, LATERAL (
VALUES
(16, e.y16)
, (17, e.y17)
, (18, e.y18)
, (19, e.y19)
, (20, e.y20)
) t(year, count)
ORDER BY count DESC; -- your desired sort order
db<>fiddle here
Since this only needs a single scan over the table, it's many times faster than aggregating ever output value separately.
Each line in the VALUES expression forms a row with two columns: year (number defaults to integer) and count (type of referenced column).
See:
Query for crosstab view
SELECT DISTINCT on multiple columns
About LATERAL subqueries:
What is the difference between LATERAL JOIN and a subquery in PostgreSQL?
But your table design raises questions. Typically you'd have a single date or timestamp column visitors instead of visitors_y_2016, visitors_y_2017 etc. - and a simpler query based on that ...
I don't think you need a select on each year. I don't exactly know your table, but there should be a better wayu to organize your data. Also, SORT BY should be your friend if you wanna sort data. You just gotta have a single SELECT to use it like for example:
SORT BY
VISITOR_COUNT
Related
I have following table that contains quantities of items per day.
ID Date Item Count
-----------------------------
1 2022-01-01 Milk 10
2 2022-01-11 Milk 20
3 2022-01-12 Milk 10
4 2022-01-15 Milk 12
5 2022-01-16 Milk 10
6 2022-01-02 Bread 20
7 2022-01-03 Bread 22
8 2022-01-05 Bread 24
9 2022-01-08 Bread 20
10 2022-01-12 Bread 10
I want to aggregate (sum, avg, ...) the quantity per item for the last 7 days (or 14, 28 days). The expected outcome would look like this table.
ID Date Item Count Sum_7d
-------------------------------------
1 2022-01-01 Milk 10 10
2 2022-01-11 Milk 20 20
3 2022-01-12 Milk 10 30
4 2022-01-15 Milk 12 42
5 2022-01-16 Milk 10 52
6 2022-01-02 Bread 20 20
7 2022-01-03 Bread 22 42
8 2022-01-05 Bread 24 66
9 2022-01-08 Bread 10 56
10 2022-01-12 Bread 10 20
My first approach was using Redshift window functions like this
SELECT *, SUM(Count) OVER (PARTITION BY Item
ORDER BY Date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS Sum_7d
FROM my_table
but it does not give the expected results because there are missing dates and I could not figure out how to put a condition on the time range.
My fallback solution is a cross product, but that's not desirable because it is inefficient for large data.
SELECT l.Date, l.Item, l.Count, sum(r.Count) as Sum_7d
FROM my_table l,
my_table r
WHERE l.Date - r.Date < 7
AND l.Date - r.Date >= 0
AND l.Item = r.Item
GROUP BY 1, 2, 3
Is there any efficient and concise way to do such an aggregation on date ranges in Redshift?
Related:
Can I put a condition on a window function in Redshift?
Redshift SQL Window Function frame_clause with days
This is a missing data problem and a common way to "fill in the blanks" is with a cross join. You correctly point out that this can get very expensive because the cross joining (usually) massively expands the data being worked upon AND because Redshift isn't great at creating data. But you do have to fill in the missing data. The best way I have found is to create the (near) minimum data set that will complete the data then UNION this data to the original table. The code below performs this path.
There is a way to do this w/o adding rows but the SQL is large, inflexible, error prone and just plain ugly. You could create new columns (date and count) based on LAG(6), LAG(5), LAG(4) ... and compare the date of each and use the count if the date is truly in range. If you want to sum a different date look-back you need to add columns and things get uglier. Also this will only be faster that the code below for certain circumstances (very few repeats of item). It just replaces making new data in rows for making new data in columns. So don't go this way unless absolutely necessary.
Now to what I think will work for you. You need a dummy row for every date and item combination that doesn't already exist. This is the minimal set of new data that will make you window function work. In reality I make all the combinations of data and item and merge these with the existing - a slight compromise from the ideal.
First let's set up your data. I changed some names as using reserved words for column names is not ideal.
create table test (ID int, dt date, Item varchar(16), Cnt int);
insert into test values
(1, '2022-01-01', 'Milk', 10),
(2, '2022-01-11', 'Milk', 20),
(3, '2022-01-12', 'Milk', 10),
(4, '2022-01-15', 'Milk', 12),
(5, '2022-01-16', 'Milk', 10),
(6, '2022-01-02', 'Bread', 20),
(7, '2022-01-03', 'Bread', 22),
(8, '2022-01-05', 'Bread', 24),
(9, '2022-01-08', 'Bread', 20),
(10, '2022-01-12', 'Bread', 10);
The SQL for generating what you want is:
with recursive dates(dt) as
( select min(dt) as dt
from test
union all
select dt + 1
from dates d
where d.dt <= current_date
)
select *
from (
SELECT *, SUM(Cnt) OVER (PARTITION BY Item
ORDER BY Dt
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS Sum_7d
FROM (
select min(id) as id, dt, item, sum(cnt) as cnt
from (
select *
from test
union all
select NULL as id, dt, item, NULL as cnt
from ( select distinct item from test) as items
cross join dates
) as all_item_dates
group by dt, item
) as grouped
) as windowed
where id is not null
order by id, dt;
Quickly here what this does.
A recursive CTE creates the date range in question (from min date in table until today).
These dates are cross joined with the distinct list of items resulting in every date for every unique item.
This is UNIONed to the table so all data exists.
GROUP By is used to merge real data rows with dummy rows for the same item and date.
Your window function is run.
A surrounding SELECT has a WHERE clause to remove any dummy rows.
As you will note this does use a cross-join but on a much reduced set of data (just the unique item list). As long as this distinct list of items is much shorter than the size of the table (very likely) then this will perform much faster than other techniques. Also if this is the kind of data you have you might find interest in this post I wrote - http://wad-design.s3-website-us-east-1.amazonaws.com/sql_limits_wp_2.html
I want to do a datediff between 2 dates on different rows only if the rows have a condition.
my table looks like the following, with additional columns (like guid)
Id | CreateDateAndTime | condition
---------------------------------------------------------------
1 | 2018-12-11 12:07:55.273 | with this
2 | 2018-12-11 12:07:53.550 | I need to compare this state
3 | 2018-12-11 12:07:53.550 | with this
4 | 2018-12-11 12:06:40.780 | state 3
5 | 2018-12-11 12:06:39.317 | I need to compare this state
with this example I would like to have 2 rows in my selection which represent the difference between the dates from id 5-3 and from id 2-1.
As of now I come with a request that gives me the difference between dates from id 5-3 , id 5-1 and id 2-1 :
with t as (
SELECT TOP (100000)
*
FROM mydatatable
order by CreateDateAndTime desc)
select
DATEDIFF(SECOND, f.CreateDateAndTime, s.CreateDateAndTime) time
from t f
join t s on (f.[guid] = s.[guid] )
where f.condition like '%I need to compare this state%'
and s.condition like '%with this%'
and (f.id - s.id) < 0
My problem is I cannot set f.id - s.id to a value since other rows can be between the ones I want to make the diff on.
How can I make the datediff only on the first rows that meet my conditions?
EDIT : To make it more clear
My condition is an eventname and I want to calculate the time between the occurence of my event 1 and my event 2 and fill a column named time for example.
#Salman A answer is really close to what I want except it will not work when my event 2 is not happening (which was not in my initial example)
i.e. in table like the following , it will make the datediff between row id 5 and row id 2
Id | CreateDateAndTime | condition
---------------------------------------------------------------
1 | 2018-12-11 12:07:55.273 | with this
2 | 2018-12-11 12:07:53.550 | I need to compare this state
3 | 2018-12-11 12:07:53.550 | state 3
4 | 2018-12-11 12:06:40.780 | state 3
5 | 2018-12-11 12:06:39.317 | I need to compare this state
the code I modified :
WITH cte AS (
SELECT id
, CreateDateAndTime AS currdate
, LAG(CreateDateAndTime) OVER (PARTITION BY guid ORDER BY id desc ) AS prevdate
, condition
FROM t
WHERE condition IN ('I need to compare this state', 'with this ')
)
SELECT *
,DATEDIFF(second, currdate, prevdate) time
FROM cte
WHERE condition = 'I need to compare this state '
and DATEDIFF(second, currdate, prevdate) != 0
order by id desc
Perhaps you want to match ids with the nearest smaller id. You can use window functions for this:
WITH cte AS (
SELECT id
, CreateDateAndTime AS currdate
, CASE WHEN LAG(condition) OVER (PARTITION BY guid ORDER BY id) = 'with this'
THEN LAG(CreateDateAndTime) OVER (PARTITION BY guid ORDER BY id) AS prevdate
, condition
FROM t
WHERE condition IN ('I need to compare this state', 'with this')
)
SELECT *
, DATEDIFF(second, currdate, prevdate)
FROM cte
WHERE condition = 'I need to compare this state'
The CASE expression will match this state with with this. If you have mismatching pairs then it'll return NULL.
try by using analytic function lead()
with cte as
(
select 1 as id, '2018-12-11 12:07:55.273' as CreateDateAndTime,'with this' as condition union all
select 2,'2018-12-11 12:07:53.550','I need to compare this state' union all
select 3,'2018-12-11 12:07:53.550','with this' union all
select 4,'2018-12-11 12:06:40.780','state 3' union all
select 5,'2018-12-11 12:06:39.317','I need to compare this state'
) select *,
DATEDIFF(SECOND,CreateDateAndTime,lead(CreateDateAndTime) over(order by Id))
from cte
where condition in ('with this','I need to compare this state')
You Ideally want LEADIF/LAGIF functions, because you are looking for the previous row where condition = 'with this'. Since there are no LEADIF/LAGIFI think the best option is to use OUTER/CROSS APPLY with TOP 1, e.g
CREATE TABLE #T (Id INT, CreateDateAndTime DATETIME, condition VARCHAR(28));
INSERT INTO #T (Id, CreateDateAndTime, condition)
VALUES
(1, '2018-12-11 12:07:55', 'with this'),
(2, '2018-12-11 12:07:53', 'I need to compare this state'),
(3, '2018-12-11 12:07:53', 'with this'),
(4, '2018-12-11 12:06:40', 'state 3'),
(5, '2018-12-11 12:06:39', 'I need to compare this state');
SELECT ID1 = t1.ID,
Date1 = t1.CreateDateAndTime,
ID2 = t2.ID,
Date2 = t2.CreateDateAndTime,
Difference = DATEDIFF(SECOND, t1.CreateDateAndTime, t2.CreateDateAndTime)
FROM #T AS t1
CROSS APPLY
( SELECT TOP 1 t2.CreateDateAndTime, t2.ID
FROM #T AS t2
WHERE t2.Condition = 'with this'
AND t2.CreateDateAndTime > t1.CreateDateAndTime
--AND t2.GUID = t.GUID
ORDER BY CreateDateAndTime
) AS t2
WHERE t1.Condition = 'I need to compare this state';
Which Gives:
ID1 Date1 D2 Date2 Difference
-------------------------------------------------------------------------------
2 2018-12-11 12:07:53.000 1 2018-12-11 12:07:55.000 2
5 2018-12-11 12:06:39.000 3 2018-12-11 12:07:53.000 74
I would enumerate the values and then use window functions for the difference.
select min(id), max(id),
datediff(second, min(CreateDateAndTime), max(CreateDateAndTime)) as seconds
from (select t.*,
row_number() over (partition by condition order by CreateDateAndTime) as seqnum
from t
where condition in ('I need to compare this state', 'with this')
) t
group by seqnum;
I cannot tell what you want the results to look like. This version only output the differences, with the ids of the rows you care about. The difference can also be applied to the original rows, rather than put into summary rows.
How to get substring from column which contains records for filter and group by clause in AWS Redshift database.
I have table with records like:
Table_Id | Categories | Value
<ID> | ABC1; ABC1-1; XYZ | 10
<ID> | ABC1; ABC1-2; XYZ | 15
<ID> | XYZ | 5
.....
Now I want to filter records based on individual category like 'ABC1' or 'ABC1 and XYZ'
Expected output from query would like:
Table_Id | Categories | Value
<ID> | ABC1 | 25
<ID> | ABC1-1 | 10
<ID> | ABC1-2 | 15
<ID> | XYZ | 30
.....
So need to group results based on individual categories.
If you have at most 3 values in any "categories" cell you can unnest the cells, get the list of unique values and use that list in a join condition like this:
WITH
values as (
select distinct category
from (
select distinct split_part(categories,';',1) as category from your_table
union select distinct split_part(categories,';',2) from your_table
union select distinct split_part(categories,';',3) from your_table
)
where nullif(category,'') is not null
)
SELECT
t2.category
,sum(t1.value)
FROM your_table t1
JOIN values t2
ON split_part(categories,';',1)=t2.category
OR split_part(categories,';',2)=t2.category
OR split_part(categories,';',3)=t2.category
if you have more than 3 options just add another split_part level both in WITH part and the join condition
#JonScott, #AlexYes and other pals who struggle with similar kinda situations.
I found more better approach other than suggested by #AlexYes.
What I did, I flatter category column which result individual records.
Which I can further process.
Query:
select row_number() over(order by 1) as r1,
to_char(timestamptz 'epoch' + date_time * interval '1 second', 'yyyy-mm-dd') AS DAY,
split_part(categories, ';', numbers.n) as catg,
value
from <TABLE>
join numbers
on numbers.n <= regexp_count(category_string, ';') + 1 <OTHER_CONDITIONS>
Explanation:
Two functions are useful here: first, the split_part function, which takes a string, splits it on ';' delimiter, and returns the first, second, ... , nth value specified from the split string; second, regexp_count, which tells us how many times a particular pattern is found in our string.
To do this fully dynamically, you need to transpose or pivot values in "categories" column into separate rows.
Unfortunately, a "fully dynamic" solution (without knowing the different values beforehand) is NOT possible using redshift.
Your options are as follows:
Use the method suggested by AlexYes in another answer. This is
semi-dynamic and is probably your best option.
Outside of Redshift, run some ETL code to perform
the column -> multiple rows ETL.
Create a hardcoded type solution, and perform the pivot something like this:
select table_id,'ABC1' as category, case when concat(Categories,';') ilike '%ABC1;%' then value else 0 end as value from your_table
union all
select table_id,'ABC1-1' as category, case when concat(Categories,';')ilike '%ABC1-1;%' then value else 0 end as value from your_table
union all
etc
I got a values table such as:
id | user_id | value | date
---------------------------------
1 | 12 | 38 | 2014-04-05
2 | 15 | 19 | 2014-04-05
3 | 12 | 47 | 2014-04-08
I want to retrieve all values for given dates. However, if I don't have a value for one specific date, I want to get the previous available value. For instance, with the above dataset, if I query values for user 12 for dates 2014-04-07 and 2014-04-08, I need to retrieve 38 and 47.
I succeeded using two queries like:
SELECT *
FROM values
WHERE date <= $date
ORDER BY date DESC
LIMIT 1
However, it would require dates.length requests each time. So, I'm wondering if there is any more performant solution to retrieve all my values in a single request?
In general, you would use a VALUES clause to specify multiple values in a single query.
If you have only occasional dates missing (and thus no big gaps in dates between rows for any particular user_id) then this would be an elegant solution:
SELECT dt, coalesce(value, lag(value) OVER (ORDER BY dt)) AS value
FROM (VALUES ('2014-04-07'::date), ('2014-04-08')) AS dates(dt)
LEFT JOIN "values" ON "date" = dt AND user_id = 12;
The lag() window function picks the previous value if the current row does not have a value.
If, on the other hand, there may be big gaps, you need to do some more work:
SELECT DISTINCT dt, first_value(value) OVER (ORDER BY diff) AS value
FROM (
SELECT dt, value, dt - "date" AS diff
FROM (VALUES ('2014-04-07'::date), ('2014-04-08')) AS dates(dt)
CROSS JOIN "values"
WHERE user_id = 12) sub;
In this case a CROSS JOIN is made for user_id = 12 and differences between the dates in the VALUES clause and the table rows computed, in a sub-query. So every row has a value for field value. In the main query the value with the smallest difference is selected using the first_value() window function. Note that ordering on diff and picking the first row would not work here because you want values for multiple dates returned.
I'm struggling to find the query for the following task
I have the following data and want to find the total network day for each unique ID
ID From To NetworkDay
1 03-Sep-12 07-Sep-12 5
1 03-Sep-12 04-Sep-12 2
1 05-Sep-12 06-Sep-12 2
1 06-Sep-12 12-Sep-12 5
1 31-Aug-12 04-Sep-12 3
2 04-Sep-12 06-Sep-12 3
2 11-Sep-12 13-Sep-12 3
2 05-Sep-12 08-Sep-12 3
Problem is the date range can be overlapping and I can't come up with SQL that will give me the following results
ID From To NetworkDay
1 31-Aug-12 12-Sep-12 9
2 04-Sep-12 08-Sep-12 4
2 11-Sep-12 13-Sep-12 3
and then
ID Total Network Day
1 9
2 7
In case the network day calculation is not possible just get to the second table would be sufficient.
Hope my question is clear
We can use Oracle Analytics, namely the "OVER ... PARTITION BY" clause, in Oracle to do this. The PARTITION BY clause is kind of like a GROUP BY but without the aggregation part. That means we can group rows together (i.e. partition them) and them perform an operation on them as separate groups. As we operate on each row we can then access the columns of the previous row above. This is the feature PARTITION BY gives us. (PARTITION BY is not related to partitioning of a table for performance.)
So then how do we output the non-overlapping dates? We first order the query based on the (ID,DFROM) fields, then we use the ID field to make our partitions (row groups). We then test the previous row's TO value and the current rows FROM value for overlap using an expression like: (in pseudo code)
max(previous.DTO, current.DFROM) as DFROM
This basic expression will return the original DFROM value if it doesnt overlap, but will return the previous TO value if there is overlap. Since our rows are ordered we only need to be concerned with the last row. In cases where a previous row completely overlaps the current row we want the row then to have a 'zero' date range. So we do the same thing for the DTO field to get:
max(previous.DTO, current.DFROM) as DFROM, max(previous.DTO, current.DTO) as DTO
Once we have generated the new results set with the adjusted DFROM and DTO values, we can aggregate them up and count the range intervals of DFROM and DTO.
Be aware that most date calculations in database are not inclusive such as your data is. So something like DATEDIFF(dto,dfrom) will not include the day dto actually refers to, so we will want to adjust dto up a day first.
I dont have access to an Oracle server anymore but I know this is possible with the Oracle Analytics. The query should go something like this:
(Please update my post if you get this to work.)
SELECT id,
max(dfrom, LAST_VALUE(dto) OVER (PARTITION BY id ORDER BY dfrom) ) as dfrom,
max(dto, LAST_VALUE(dto) OVER (PARTITION BY id ORDER BY dfrom) ) as dto
from (
select id, dfrom, dto+1 as dto from my_sample -- adjust the table so that dto becomes non-inclusive
order by id, dfrom
) sample;
The secret here is the LAST_VALUE(dto) OVER (PARTITION BY id ORDER BY dfrom) expression which returns the value previous to the current row.
So this query should output new dfrom/dto values which dont overlap. It's then a simple matter of sub-querying this doing (dto-dfrom) and sum the totals.
Using MySQL
I did haves access to a mysql server so I did get it working there. MySQL doesnt have results partitioning (Analytics) like Oracle so we have to use result set variables. This means we use #var:=xxx type expressions to remember the last date value and adjust the dfrom/dto according. Same algorithm just a little longer and more complex syntax. We also have to forget the last date value any time the ID field changes!
So here is the sample table (same values you have):
create table sample(id int, dfrom date, dto date, networkDay int);
insert into sample values
(1,'2012-09-03','2012-09-07',5),
(1,'2012-09-03','2012-09-04',2),
(1,'2012-09-05','2012-09-06',2),
(1,'2012-09-06','2012-09-12',5),
(1,'2012-08-31','2012-09-04',3),
(2,'2012-09-04','2012-09-06',3),
(2,'2012-09-11','2012-09-13',3),
(2,'2012-09-05','2012-09-08',3);
On to the query, we output the un-grouped result set like above:
The variable #ld is "last date", and the variable #lid is "last id". Anytime #lid changes, we reset #ld to null. FYI In mysql the := operators is where the assignment happens, an = operator is just equals.
This is a 3 level query, but it could be reduced to 2. I went with an extra outer query to keep things more readable. The inner most query is simple and it adjusts the dto column to be non-inclusive and does the proper row ordering. The middle query does the adjustment of the dfrom/dto values to make them non-overlapped. The outer query simple drops the non-used fields, and calculate the interval range.
set #ldt=null, #lid=null;
select id, no_dfrom as dfrom, no_dto as dto, datediff(no_dto, no_dfrom) as days from (
select if(#lid=id,#ldt,#ldt:=null) as last, dfrom, dto, if(#ldt>=dfrom,#ldt,dfrom) as no_dfrom, if(#ldt>=dto,#ldt,dto) as no_dto, #ldt:=if(#ldt>=dto,#ldt,dto), #lid:=id as id,
datediff(dto, dfrom) as overlapped_days
from (select id, dfrom, dto + INTERVAL 1 DAY as dto from sample order by id, dfrom) as sample
) as nonoverlapped
order by id, dfrom;
The above query gives the results (notice dfrom/dto are non-overlapping here):
+------+------------+------------+------+
| id | dfrom | dto | days |
+------+------------+------------+------+
| 1 | 2012-08-31 | 2012-09-05 | 5 |
| 1 | 2012-09-05 | 2012-09-08 | 3 |
| 1 | 2012-09-08 | 2012-09-08 | 0 |
| 1 | 2012-09-08 | 2012-09-08 | 0 |
| 1 | 2012-09-08 | 2012-09-13 | 5 |
| 2 | 2012-09-04 | 2012-09-07 | 3 |
| 2 | 2012-09-07 | 2012-09-09 | 2 |
| 2 | 2012-09-11 | 2012-09-14 | 3 |
+------+------------+------------+------+
How about constructing an SQL which merges intervals by removing holes and considering only maximum intervals. It goes like this (not tested):
SELECT DISTINCT F.ID, F.From, L.To
FROM Temp AS F, Temp AS L
WHERE F.From < L.To AND F.ID = L.ID
AND NOT EXISTS (SELECT *
FROM Temp AS T
WHERE T.ID = F.ID
AND F.From < T.From AND T.From < L.To
AND NOT EXISTS ( SELECT *
FROM Temp AS T1
WHERE T1.ID = F.ID
AND T1.From < T.From
AND T.From <= T1.To)
)
AND NOT EXISTS (SELECT *
FROM Temp AS T2
WHERE T2.ID = F.ID
AND (
(T2.From < F.From AND F.From <= T2.To)
OR (T2.From < L.To AND L.To < T2.To)
)
)
with t_data as (
select 1 as id,
to_date('03-sep-12','dd-mon-yy') as start_date,
to_date('07-sep-12','dd-mon-yy') as end_date from dual
union all
select 1,
to_date('03-sep-12','dd-mon-yy'),
to_date('04-sep-12','dd-mon-yy') from dual
union all
select 1,
to_date('05-sep-12','dd-mon-yy'),
to_date('06-sep-12','dd-mon-yy') from dual
union all
select 1,
to_date('06-sep-12','dd-mon-yy'),
to_date('12-sep-12','dd-mon-yy') from dual
union all
select 1,
to_date('31-aug-12','dd-mon-yy'),
to_date('04-sep-12','dd-mon-yy') from dual
union all
select 2,
to_date('04-sep-12','dd-mon-yy'),
to_date('06-sep-12','dd-mon-yy') from dual
union all
select 2,
to_date('11-sep-12','dd-mon-yy'),
to_date('13-sep-12','dd-mon-yy') from dual
union all
select 2,
to_date('05-sep-12','dd-mon-yy'),
to_date('08-sep-12','dd-mon-yy') from dual
),
t_holidays as (
select to_date('01-jan-12','dd-mon-yy') as holiday
from dual
),
t_data_rn as (
select rownum as rn, t_data.* from t_data
),
t_model as (
select distinct id,
start_date
from t_data_rn
model
partition by (rn, id)
dimension by (0 as i)
measures(start_date, end_date)
rules
( start_date[for i
from 1
to end_date[0]-start_date[0]
increment 1] = start_date[0] + cv(i),
end_date[any] = start_date[cv()] + 1
)
order by 1,2
),
t_network_days as (
select t_model.*,
case when
mod(to_char(start_date, 'j'), 7) + 1 in (6, 7)
or t_holidays.holiday is not null
then 0 else 1
end as working_day
from t_model
left outer join t_holidays
on t_holidays.holiday = t_model.start_date
)
select id,
sum(working_day) as network_days
from t_network_days
group by id;
t_data - your initial data
t_holidays - contains list of holidays
t_data_rn - just adds unique key (rownum) to each row of t_data
t_model - expands t_data date ranges into a flat list of dates
t_network_days - marks each date from t_model as working day or weekend based on day of week (Sat and Sun) and holidays list
final query - calculates number of network day per each group.