Matching strings between columns based on position - sql

I have a view that aggregates data about customers and shows the products they have access to, along with the status of whether they use those products on a trial basis or not (both as string comma seperated values):
+----------+----------+-----------------------+
| customer | products | products_trial_status |
+----------+----------+-----------------------+
| 234253 | A,B,C | false,true,false |
| 923403 | A,C | true,true |
| 123483 | B | true |
| 239874 | B,C | false,false |
+----------+----------+-----------------------+
and I would like to write a query that returns a list of customers who are using a certain product on a trial.
e.g. I want to see which customers using product B are on a trial, I would get something like this:
+----------+
| customer |
+----------+
| 234253 |
| 123483 |
+----------+
The only way I can think of doing this is by checking the products column for the position of the product in the string (if it exists there), then checking the corresponding value at the same position in the products_trial_status column and whether it is equal to true.
i.e. for customer 234253, product B is in position 2 (after the first comma), so it's corresponding trial status in the column would also be in position 2 after the first comma there.
How would I go about doing this?
I am aware that storing such data as a string of values is not good practice but it is not something i can change, so would need to work out using the format it is in

You could split the string but it will be quicker to do some (fairly hideous) string manipulations and
Replace your comma-delimited true/false string with a non-delimited string of 1s and 0s.
Count the number of terms before the B term by counting the number of preceding commas.
Return the appropriate substring of your 1/0 list.
Like this:
SELECT customer,
COALESCE(
SUBSTR(
status,
LENGTH(preceding_terms) - COALESCE(LENGTH(REPLACE(preceding_terms, ',')), 0),
1
),
'0'
) AS hasB
FROM (
SELECT customer,
SUBSTR(','||products, 1, INSTR(','||products||',', ',B,')) AS preceding_terms,
TRANSLATE(products_trial_status, 'tfrueals,', '10') AS status
FROM table_name
)
Which, for the sample data:
CREATE TABLE table_name ( customer, products, products_trial_status ) AS
SELECT 234253, 'A,B,C', 'false,true,false' FROM DUAL UNION ALL
SELECT 923403, 'A,C', 'true,true' FROM DUAL UNION ALL
SELECT 123483, 'B', 'true' FROM DUAL UNION ALL
SELECT 239874, 'B,C', 'false,false' FROM DUAL;
Outputs:
CUSTOMER
HASB
234253
1
923403
0
123483
1
239874
0
If you only want the customer numbers then you can add filters:
SELECT customer
FROM (
SELECT customer,
SUBSTR(','||products, 1, INSTR(','||products||',', ',B,')) AS preceding_terms,
TRANSLATE(products_trial_status, 'tfrueals,', '10') AS status
FROM table_name
WHERE INSTR(','||products||',', ',B,') > 0
)
WHERE SUBSTR(
status,
LENGTH(preceding_terms) - COALESCE(LENGTH(REPLACE(preceding_terms, ',')), 0),
1
) = '1'
Which outputs:
CUSTOMER
234253
123483
fiddle

You can use hierarchical query in which split the strings by commas and count the number of commas along with regular expression functions such as
WITH t1 AS
(
SELECT customer,
REGEXP_SUBSTR(products,'[^,]',1,level) AS products,
REGEXP_SUBSTR(products_trial_status,'[^,]+',1,level) AS products_ts
FROM t -- your data source
CONNECT BY level <= REGEXP_COUNT(products,',')+1
AND PRIOR customer = customer
AND PRIOR sys_guid() IS NOT NULL
)
SELECT customer
FROM t1
WHERE products = 'B'
AND products_ts = 'true'
Demo

Related

How to create an observation with 0 in the column

I am using the code below to get the quarterly wages for individuals from 2010Q1-2020Q4. If an individual did not work in a particular quarter they do not have an observation for that quarter. Instead, I would like for there to be an observation but have the quarterly wage be 0. For example,
What is currently happening:
| MPI | Quarter| Wage|
|PersonA|2010Q1 | 100 |
|PersonA|2010Q2 | 100 |
|PersonA|2010Q3 | 100 |
|PersonB|2010Q1 | 100 |
Desired output
| MPI | Quarter| Wage|
|PersonA|2010Q1 | 100 |
|PersonA|2010Q2 | 100 |
|PersonA|2010Q3 | 100 |
|PersonA|2010Q4 | 0 |
|PersonB|2010Q1 | 100 |
|PersonB|2010Q2 | 0 |
|PersonB|2010Q3 | 0 |
|PersonB|2010Q4 | 0 |
ws_data AS (
SELECT
MASTER_PERSON_INDEX AS mpi
,SUBSTR(cast(wg.naics as string), 1, 2) AS NAICS_2
,SUBSTR(cast(wg.yrqtr as string), 0,5) AS quarter
,wg.yrqtr
,wg.employer
,wg.wages
,SUBSTR(cast(wg.yrqtr as string), 0,4) AS YEAR
FROM
( SELECT
*
FROM
`ws.ws_ui_wage_records_di` wsui
WHERE
wsui.MASTER_PERSON_INDEX IN (SELECT mpi FROM rc_table_ra16_all_grads_1b)
AND
wsui.yrqtr IN (20101, 20102, 20103, 20104,
20111, 20112, 20113, 20114,
20121, 20122, 20123, 20124,
20131, 20132, 20133, 20134,
20141, 20142, 20143, 20144,
20151, 20152, 20153, 20154,
20161, 20162, 20163, 20164,
20171, 20172, 20173, 20174,
20181, 20182, 20183, 20184,
20191, 20192, 20193, 20194,
20201, 20202, 20203, 20204)
)wg
),
ws_agg AS (
SELECT
mpi
-- ,STATS_MODE(NAICS_2) AS NAICS_2
-- ,STATS_MODE(NAICS_DESC) AS NAICS_DESC
,quarter
,SUM(wages) AS wages_quart
FROM
ws_data
GROUP BY
mpi, quarter
),
ws_annot AS (
SELECT
dagg.*
,row_number() OVER(PARTITION BY dagg.mpi, cast(wages_quart as string) ORDER BY dagg.wages_quart DESC)AS rn
FROM
ws_agg dagg
)
Try using this data to create a CTE at the top as a Quarter table and then using that as the starting point in your main from statement. You should be able to replace the original code I copied from (wg where statement) with that top CTE as well.
(20101, 20102, 20103, 20104,
20111, 20112, 20113, 20114,
20121, 20122, 20123, 20124,
20131, 20132, 20133, 20134,
20141, 20142, 20143, 20144,
20151, 20152, 20153, 20154,
20161, 20162, 20163, 20164,
20171, 20172, 20173, 20174,
20181, 20182, 20183, 20184,
20191, 20192, 20193, 20194,
20201, 20202, 20203, 20204)
Your db may have a DateDimension table with quarters in it that you could use as well.
Since you want all quarters, and all individuals, one way to achieve this is to start with building all individual-quarter combinations in your data and use that as a 'driver' in a left join; like this:
select
Pers.MID
, Qtr.Quarter
, coalesce(W.Wage,0) as Wage
, ...
from
(select distinct MPIfrom YourTable) Pers
cross join
(select distinct Quarter from DateDimensionTable) Qtr
left join
YourTable W
on w.MPI=Pers.MPI
and w.Quarter=Qtr.Quarter
If your table has all the periods you are interested in, you can use YourTable, instead of DateDimensionTable. But if it doesn't, and I guess it can't be guaranteed, then you can use a Date/Calendar table here , if you have any, or you can dynamically generate quarters between min and max quarter in YourTable; just search for these terms). You can also hardcode them as you have in your query (as JBontje recommended).
If a combination is missing from YourTable then the Wage for that combo will be null, you can use coalesce to treat it as zero.

Return count of total group membership when providers are part of a group

TABLE A: Pre-joined table - Holds a list of providers who belong to a group and the group the provider belongs to. Columns are something like this:
ProviderID (PK, FK) | ProviderName | GroupID | GroupName
1234 | LocalDoctor | 987 | LocalDoctorsUnited
5678 | Physican82 | 987 | LocalDoctorsUnited
9012 | Dentist13 | 153 | DentistryToday
0506 | EyeSpecial | 759 | OphtaSpecialist
TABLE B: Another pre-joined table, holds a list of providers and their demographic information. Columns as such:
ProviderID (PK,FK) | ProviderName | G_or_I | OtherColumnsThatArentInUse
1234 | LocalDoctor | G | Etc.
5678 | Physican82 | G | Etc.
9012 | Dentist13 | I | Etc.
0506 | EyeSpecial | I | Etc.
The expected result is something like this:
ProviderID | ProviderName | ProviderStatus | GroupCount
1234 | LocalDoctor | Group | 2
5678 | Physican82 | Group | 2
9012 | Dentist13 | Individual | N/A
0506 | EyeSpecial | Individual | N/A
The goal is to determine whether or not a provider belongs to a group or operates individually, by the G_or_I column. If the provider belongs to a group, I need to include an additional column that provides the count of total providers in that group.
The Group/Individual portion is relatively easy - I've done something like this:
SELECT DISTINCT
A.ProviderID,
A.ProviderName,
CASE
WHEN B.G_or_I = 'G'
THEN 'Group'
WHEN B.G_or_I = 'I'
THEN 'Individual' END AS ProviderStatus
FROM
TableA A
LEFT OUTER JOIN TableB B
ON A.ProviderID = B.ProviderID;
So far so good, this returns the expected results based on the G_or_I flag.
However, I can't seem to wrap my head around how to complete the COUNT portion. I feel like I may be overthinking it, and stuck in a loop of errors. Some things I've tried:
Add a second CASE STATEMENT:
CASE
WHEN B.G_or_I = 'G'
THEN (
SELECT CountedGroups
FROM (
SELECT ProviderID, count(GroupID) AS CountedGroups
FROM TableA
WHERE A.ProviderID = B.ProviderID
GROUP BY ProviderID --originally had this as ORDER BY, but that was a mis-type on my part
)
)
ELSE 'N/A' END
This returns an error stating that a single row sub-query is returning more than one row. If I limit the number of rows returned to 1, the CountedGroups column returns 1 for every row. This makes me think that its not performing the count function as I expect it to.
I've also tried including a direct count of TableA as a factored sub-query:
WITH CountedGroups AS
( SELECT Provider ID, count(GroupID) As GroupSum
FROM TableA
GROUP BY ProviderID --originally had this as ORDER BY, but that was a mis-type on my part
) --This as a standalone query works just fine
SELECT DISTINCT
A.ProviderID,
A.ProviderName,
CASE
WHEN B.G_or_I = 'G'
THEN 'Group'
WHEN B.G_or_I = 'I'
THEN 'Individual' END AS ProviderStatus,
CASE
WHEN B.G_or_I = 'G'
THEN GroupSum
ELSE 'N/A' END
FROM
CountedGroups CG
JOIN TableA A
ON CG.ProviderID = A.ProviderID
LEFT OUTER JOIN TableB
ON A.ProviderID = B.ProviderID
This returns either null or completely incorrect column values
Other attempts have been a number of variations of this, with a mix of bad results or Oracle errors. As I mentioned above, I'm probably way overthinking it and the solution could be rather simple. Apologies if the information is confusing or I've not provided enough detail. The real tables have a lot of private medical information, and I tried to translate the essence of the issue as best I could.
Thank you.
You can use the CASE..WHEN and analytical function COUNT as follows:
SELECT
A.PROVIDERID,
A.PROVIDERNAME,
CASE
WHEN B.G_OR_I = 'G' THEN 'Group'
ELSE 'Individual'
END AS PROVIDERSTATUS,
CASE
WHEN B.G_OR_I = 'G' THEN TO_CHAR(COUNT(1) OVER(
PARTITION BY A.GROUPID
))
ELSE 'N/A'
END AS GROUPCOUNT
FROM
TABLE_A A
JOIN TABLE_B B ON A.PROVIDERID = B.PROVIDERID;
TO_CHAR is needed on COUNT as output expression must be of the same data type in CASE..WHEN
Your problem seems to be that you are missing a column. You need to add group name, otherwise you won't be able to differentiate rows for the same practitioner who works under multiple business entities (groups). This is probably why you have a DISTINCT on your query. Things looked like duplicates which weren't. Once you've done that, just use an analytic function to figure out the rest:
SELECT ta.providerid,
ta.providername,
DECODE(tb.g_or_i, 'G', 'Group', 'I', 'Individual') AS ProviderStatus,
ta.group_name,
CASE
WHEN tb.g_or_i = 'G' THEN COUNT(DISTINCT ta.provider_id) OVER (PARTITION BY ta.group_id)
ELSE 'N/A'
END AS GROUP_COUNT
FROM table_a ta
INNER JOIN table_b tb ON ta.providerid = tb.providerid
Is it possible that your LEFT JOIN was going the wrong direction? It makes more sense that your base demographic table would have all practitioners in it and then the Group table might be missing some records. For instance if the solo prac was operating under their own SSN and Type I NPI without applying for a separate Type II NPI or TIN.

if count value of the column is greater than 1, I want to print the count of the column else I want to print value in the field

I am writing a query which fetches details from different tables. In one column I want to print count value of a column. If the count value of the column is greater than 1, I want to print the count of the column else I want to print value in the field.
I want to build a query which will give me count of user_id from table 1 & 2. if the count user_id is greater than 1, then print count (user_id) else print value of user_id
Table:1
| user_id |
| John |
| Bob |
| Kris |
| Tom |
Table:2
| user_id |
| Rob |
query result should list count of table1 as it greater than 1. Table2 should list Rob as it is lesser than 2
You want to select user IDs (names actually) from a table. If it's just one row then show that name, otherwise show the number of entries instead. So, just use a CASE expression to check whether count is 1 or greater than 1.
You probably need CAST or CONVERT to turn the count number into a string, so the CASE expression always returns the same type (this is how CASE works).
select
case when count(*) > 1
then cast(count(*) as varchar(100))
else max(user_id)
end as name_or_count
from mytable
Window Functions come to mind but since your user_ids are not numbers, you'll run into an issue where you can't have two different data types in the same column. See how this works for you. Make sure to cast the varchar numbers back to integer if this script is part of a larger process.
with cte as
(select 'John' as user_id union all
select 'Bob' as user_id union all
select 'Kris' as user_id union all
select 'Tom' as user_id)
select distinct case when count(*) over() > 1
then cast(count(*) over() as varchar) else user_id end
from cte
with cte as
(select 'Rob' as user_id)
select distinct case when count(*) over() > 1
then cast(count(*) over() as varchar) else user_id end
from cte

How to get substring for filter and group by clause in AWS Redshift database

How to get substring from column which contains records for filter and group by clause in AWS Redshift database.
I have table with records like:
Table_Id | Categories | Value
<ID> | ABC1; ABC1-1; XYZ | 10
<ID> | ABC1; ABC1-2; XYZ | 15
<ID> | XYZ | 5
.....
Now I want to filter records based on individual category like 'ABC1' or 'ABC1 and XYZ'
Expected output from query would like:
Table_Id | Categories | Value
<ID> | ABC1 | 25
<ID> | ABC1-1 | 10
<ID> | ABC1-2 | 15
<ID> | XYZ | 30
.....
So need to group results based on individual categories.
If you have at most 3 values in any "categories" cell you can unnest the cells, get the list of unique values and use that list in a join condition like this:
WITH
values as (
select distinct category
from (
select distinct split_part(categories,';',1) as category from your_table
union select distinct split_part(categories,';',2) from your_table
union select distinct split_part(categories,';',3) from your_table
)
where nullif(category,'') is not null
)
SELECT
t2.category
,sum(t1.value)
FROM your_table t1
JOIN values t2
ON split_part(categories,';',1)=t2.category
OR split_part(categories,';',2)=t2.category
OR split_part(categories,';',3)=t2.category
if you have more than 3 options just add another split_part level both in WITH part and the join condition
#JonScott, #AlexYes and other pals who struggle with similar kinda situations.
I found more better approach other than suggested by #AlexYes.
What I did, I flatter category column which result individual records.
Which I can further process.
Query:
select row_number() over(order by 1) as r1,
to_char(timestamptz 'epoch' + date_time * interval '1 second', 'yyyy-mm-dd') AS DAY,
split_part(categories, ';', numbers.n) as catg,
value
from <TABLE>
join numbers
on numbers.n <= regexp_count(category_string, ';') + 1 <OTHER_CONDITIONS>
Explanation:
Two functions are useful here: first, the split_part function, which takes a string, splits it on ';' delimiter, and returns the first, second, ... , nth value specified from the split string; second, regexp_count, which tells us how many times a particular pattern is found in our string.
To do this fully dynamically, you need to transpose or pivot values in "categories" column into separate rows.
Unfortunately, a "fully dynamic" solution (without knowing the different values beforehand) is NOT possible using redshift.
Your options are as follows:
Use the method suggested by AlexYes in another answer. This is
semi-dynamic and is probably your best option.
Outside of Redshift, run some ETL code to perform
the column -> multiple rows ETL.
Create a hardcoded type solution, and perform the pivot something like this:
select table_id,'ABC1' as category, case when concat(Categories,';') ilike '%ABC1;%' then value else 0 end as value from your_table
union all
select table_id,'ABC1-1' as category, case when concat(Categories,';')ilike '%ABC1-1;%' then value else 0 end as value from your_table
union all
etc

How to write Oracle query to find a total length of possible overlapping from-to dates

I'm struggling to find the query for the following task
I have the following data and want to find the total network day for each unique ID
ID From To NetworkDay
1 03-Sep-12 07-Sep-12 5
1 03-Sep-12 04-Sep-12 2
1 05-Sep-12 06-Sep-12 2
1 06-Sep-12 12-Sep-12 5
1 31-Aug-12 04-Sep-12 3
2 04-Sep-12 06-Sep-12 3
2 11-Sep-12 13-Sep-12 3
2 05-Sep-12 08-Sep-12 3
Problem is the date range can be overlapping and I can't come up with SQL that will give me the following results
ID From To NetworkDay
1 31-Aug-12 12-Sep-12 9
2 04-Sep-12 08-Sep-12 4
2 11-Sep-12 13-Sep-12 3
and then
ID Total Network Day
1 9
2 7
In case the network day calculation is not possible just get to the second table would be sufficient.
Hope my question is clear
We can use Oracle Analytics, namely the "OVER ... PARTITION BY" clause, in Oracle to do this. The PARTITION BY clause is kind of like a GROUP BY but without the aggregation part. That means we can group rows together (i.e. partition them) and them perform an operation on them as separate groups. As we operate on each row we can then access the columns of the previous row above. This is the feature PARTITION BY gives us. (PARTITION BY is not related to partitioning of a table for performance.)
So then how do we output the non-overlapping dates? We first order the query based on the (ID,DFROM) fields, then we use the ID field to make our partitions (row groups). We then test the previous row's TO value and the current rows FROM value for overlap using an expression like: (in pseudo code)
max(previous.DTO, current.DFROM) as DFROM
This basic expression will return the original DFROM value if it doesnt overlap, but will return the previous TO value if there is overlap. Since our rows are ordered we only need to be concerned with the last row. In cases where a previous row completely overlaps the current row we want the row then to have a 'zero' date range. So we do the same thing for the DTO field to get:
max(previous.DTO, current.DFROM) as DFROM, max(previous.DTO, current.DTO) as DTO
Once we have generated the new results set with the adjusted DFROM and DTO values, we can aggregate them up and count the range intervals of DFROM and DTO.
Be aware that most date calculations in database are not inclusive such as your data is. So something like DATEDIFF(dto,dfrom) will not include the day dto actually refers to, so we will want to adjust dto up a day first.
I dont have access to an Oracle server anymore but I know this is possible with the Oracle Analytics. The query should go something like this:
(Please update my post if you get this to work.)
SELECT id,
max(dfrom, LAST_VALUE(dto) OVER (PARTITION BY id ORDER BY dfrom) ) as dfrom,
max(dto, LAST_VALUE(dto) OVER (PARTITION BY id ORDER BY dfrom) ) as dto
from (
select id, dfrom, dto+1 as dto from my_sample -- adjust the table so that dto becomes non-inclusive
order by id, dfrom
) sample;
The secret here is the LAST_VALUE(dto) OVER (PARTITION BY id ORDER BY dfrom) expression which returns the value previous to the current row.
So this query should output new dfrom/dto values which dont overlap. It's then a simple matter of sub-querying this doing (dto-dfrom) and sum the totals.
Using MySQL
I did haves access to a mysql server so I did get it working there. MySQL doesnt have results partitioning (Analytics) like Oracle so we have to use result set variables. This means we use #var:=xxx type expressions to remember the last date value and adjust the dfrom/dto according. Same algorithm just a little longer and more complex syntax. We also have to forget the last date value any time the ID field changes!
So here is the sample table (same values you have):
create table sample(id int, dfrom date, dto date, networkDay int);
insert into sample values
(1,'2012-09-03','2012-09-07',5),
(1,'2012-09-03','2012-09-04',2),
(1,'2012-09-05','2012-09-06',2),
(1,'2012-09-06','2012-09-12',5),
(1,'2012-08-31','2012-09-04',3),
(2,'2012-09-04','2012-09-06',3),
(2,'2012-09-11','2012-09-13',3),
(2,'2012-09-05','2012-09-08',3);
On to the query, we output the un-grouped result set like above:
The variable #ld is "last date", and the variable #lid is "last id". Anytime #lid changes, we reset #ld to null. FYI In mysql the := operators is where the assignment happens, an = operator is just equals.
This is a 3 level query, but it could be reduced to 2. I went with an extra outer query to keep things more readable. The inner most query is simple and it adjusts the dto column to be non-inclusive and does the proper row ordering. The middle query does the adjustment of the dfrom/dto values to make them non-overlapped. The outer query simple drops the non-used fields, and calculate the interval range.
set #ldt=null, #lid=null;
select id, no_dfrom as dfrom, no_dto as dto, datediff(no_dto, no_dfrom) as days from (
select if(#lid=id,#ldt,#ldt:=null) as last, dfrom, dto, if(#ldt>=dfrom,#ldt,dfrom) as no_dfrom, if(#ldt>=dto,#ldt,dto) as no_dto, #ldt:=if(#ldt>=dto,#ldt,dto), #lid:=id as id,
datediff(dto, dfrom) as overlapped_days
from (select id, dfrom, dto + INTERVAL 1 DAY as dto from sample order by id, dfrom) as sample
) as nonoverlapped
order by id, dfrom;
The above query gives the results (notice dfrom/dto are non-overlapping here):
+------+------------+------------+------+
| id | dfrom | dto | days |
+------+------------+------------+------+
| 1 | 2012-08-31 | 2012-09-05 | 5 |
| 1 | 2012-09-05 | 2012-09-08 | 3 |
| 1 | 2012-09-08 | 2012-09-08 | 0 |
| 1 | 2012-09-08 | 2012-09-08 | 0 |
| 1 | 2012-09-08 | 2012-09-13 | 5 |
| 2 | 2012-09-04 | 2012-09-07 | 3 |
| 2 | 2012-09-07 | 2012-09-09 | 2 |
| 2 | 2012-09-11 | 2012-09-14 | 3 |
+------+------------+------------+------+
How about constructing an SQL which merges intervals by removing holes and considering only maximum intervals. It goes like this (not tested):
SELECT DISTINCT F.ID, F.From, L.To
FROM Temp AS F, Temp AS L
WHERE F.From < L.To AND F.ID = L.ID
AND NOT EXISTS (SELECT *
FROM Temp AS T
WHERE T.ID = F.ID
AND F.From < T.From AND T.From < L.To
AND NOT EXISTS ( SELECT *
FROM Temp AS T1
WHERE T1.ID = F.ID
AND T1.From < T.From
AND T.From <= T1.To)
)
AND NOT EXISTS (SELECT *
FROM Temp AS T2
WHERE T2.ID = F.ID
AND (
(T2.From < F.From AND F.From <= T2.To)
OR (T2.From < L.To AND L.To < T2.To)
)
)
with t_data as (
select 1 as id,
to_date('03-sep-12','dd-mon-yy') as start_date,
to_date('07-sep-12','dd-mon-yy') as end_date from dual
union all
select 1,
to_date('03-sep-12','dd-mon-yy'),
to_date('04-sep-12','dd-mon-yy') from dual
union all
select 1,
to_date('05-sep-12','dd-mon-yy'),
to_date('06-sep-12','dd-mon-yy') from dual
union all
select 1,
to_date('06-sep-12','dd-mon-yy'),
to_date('12-sep-12','dd-mon-yy') from dual
union all
select 1,
to_date('31-aug-12','dd-mon-yy'),
to_date('04-sep-12','dd-mon-yy') from dual
union all
select 2,
to_date('04-sep-12','dd-mon-yy'),
to_date('06-sep-12','dd-mon-yy') from dual
union all
select 2,
to_date('11-sep-12','dd-mon-yy'),
to_date('13-sep-12','dd-mon-yy') from dual
union all
select 2,
to_date('05-sep-12','dd-mon-yy'),
to_date('08-sep-12','dd-mon-yy') from dual
),
t_holidays as (
select to_date('01-jan-12','dd-mon-yy') as holiday
from dual
),
t_data_rn as (
select rownum as rn, t_data.* from t_data
),
t_model as (
select distinct id,
start_date
from t_data_rn
model
partition by (rn, id)
dimension by (0 as i)
measures(start_date, end_date)
rules
( start_date[for i
from 1
to end_date[0]-start_date[0]
increment 1] = start_date[0] + cv(i),
end_date[any] = start_date[cv()] + 1
)
order by 1,2
),
t_network_days as (
select t_model.*,
case when
mod(to_char(start_date, 'j'), 7) + 1 in (6, 7)
or t_holidays.holiday is not null
then 0 else 1
end as working_day
from t_model
left outer join t_holidays
on t_holidays.holiday = t_model.start_date
)
select id,
sum(working_day) as network_days
from t_network_days
group by id;
t_data - your initial data
t_holidays - contains list of holidays
t_data_rn - just adds unique key (rownum) to each row of t_data
t_model - expands t_data date ranges into a flat list of dates
t_network_days - marks each date from t_model as working day or weekend based on day of week (Sat and Sun) and holidays list
final query - calculates number of network day per each group.