Find the minimum iterations in a loop - BigQuery - sql

I'm trying to establish a code in BigQuery SQL to find the minimum iterations needed to cover all items.
In details, say I have a table with 3 columns: Site, Router, PC.
Each site has multiple routers that are connected to different PCs.
I want to find the minimum routers needed in each site that covers ALL the PCs in the site.
For example: In site X I have 5 different routers connected to 9 PCs, but I can keep only 3 routers that will give a full coverage of the PCs. Routers AAA, BBB, DDD can see all 9 PCs. I can drop routers CCC and EEE and I still have a full coverage for all the PCs.
Raw data:
with pop as (
select 'X' as site,'AAA' as router, 1 as pc union all
select 'X' as site,'AAA' as router, 2 as pc union all
select 'X' as site,'AAA' as router, 3 as pc union all
select 'X' as site,'AAA' as router, 4 as pc union all
select 'X' as site,'AAA' as router, 5 as pc union all
select 'X' as site,'BBB' as router, 4 as pc union all
select 'X' as site,'BBB' as router, 6 as pc union all
select 'X' as site,'BBB' as router, 7 as pc union all
select 'X' as site,'CCC' as router, 2 as pc union all
select 'X' as site,'CCC' as router, 4 as pc union all
select 'X' as site,'CCC' as router, 7 as pc union all
select 'X' as site,'DDD' as router, 1 as pc union all
select 'X' as site,'DDD' as router, 8 as pc union all
select 'X' as site,'DDD' as router, 9 as pc union all
select 'X' as site,'EEE' as router, 5 as pc union all
select 'X' as site,'EEE' as router, 6 as pc union all
select 'Y' as site,'FFF' as router, 1 as pc union all
select 'Y' as site,'GGG' as router, 2 as pc union all
select 'Y' as site,'HHH' as router, 1 as pc union all
select 'Y' as site,'HHH' as router, 2 as pc union all
select 'Y' as site,'HHH' as router, 3 as pc
)
select *
from pop
Expected outcome:
Thank you in advance!

We can use a 'number table' to enumerate all combinations (representing the routers as 1 bits in an integer value and using the bitwise AND function):
with MyTbl as (
select *
from (
values
('X','AAA',1) ,('X','AAA',2) ,('X','AAA',3) ,('X','AAA',4) ,('X','AAA',5) ,('X','BBB',4)
,('X','BBB',6) ,('X','BBB',7) ,('X','CCC',2) ,('X','CCC',4) ,('X','CCC',7) ,('X','DDD',1)
,('X','DDD',8) ,('X','DDD',9) ,('X','EEE',5) ,('X','EEE',6) ,('X','FFF',1) ,('Y','GGG',2)
,('Y','HHH',1) ,('Y','HHH',2) ,('Y','HHH',3)
) T (Site, Router, PC)
),
Nos as (
-- Up to 8 routers per site are supported
select T0.N+T1.N*2+T2.N*4+T3.N*8+T4.N*16+T5.N*32+T6.N*64+T7.N*128 as Combtn
from
(values (0),(1)) T0(N), (values (0),(1)) T1(N), (values (0),(1)) T2(N), (values (0),(1)) T3(N),
(values (0),(1)) T4(N), (values (0),(1)) T5(N), (values (0),(1)) T6(N), (values (0),(1)) T7(N)
),
Routers as (
select Site
, Router
, power(2, row_number() over (partition by Site order by Router)) as RouterSeqBM
from (select distinct Site, Router from MyTbl) S1
),
RouterCombos as (
select R.Site, N.Combtn, R.Router
from Nos N
inner join Routers R
on R.RouterSeqBM & N.Combtn <> 0 --< Adjust this &
)
select RC.Site, RC.Router
from
(
select *
from (
-- Choose the coombination with the least no of routers
select Seqd.*
, row_number()
over (partition by Site
order By NumRouters) as NumRouterSeq
from (
-- count routers for each site covering all PCs
select Site, Combtn, count(*) as NumRouters
from RouterCombos RC
where not exists -- Make sure that all PCs are covered
(select 1
from MyTbl MT
where MT.Site=RC.Site
and MT.PC not in
(select distinct PC
from
RouterCombos RC2
inner join MyTbl MT2
on MT2.Site=RC2.Site
and MT2.Router=RC2.Router
where RC2.Combtn=RC.Combtn
and RC2.Site=RC.Site))
group by Site, Combtn
) Seqd
) Seqd1
where Seqd1.NumRouterSeq=1
) BestRouterCombs
inner join
RouterCombos RC
on RC.Site=BestRouterCombs.Site
and RC.Combtn=BestRouterCombs.Combtn
This works in SQLServer, you would need to adjust it for your environment.

Related

BigQuery recursively join based on links between 2 ID columns

Given a table representing a many-many join between IDs like the following:
WITH t AS (
SELECT 1 AS id_1, 'a' AS id_2,
UNION ALL SELECT 2, 'a'
UNION ALL SELECT 2, 'b'
UNION ALL SELECT 3, 'b'
UNION ALL SELECT 4, 'c'
UNION ALL SELECT 5, 'c'
UNION ALL SELECT 6, 'd'
UNION ALL SELECT 6, 'e'
UNION ALL SELECT 7, 'f'
)
SELECT * FROM t
id_1
id_2
1
a
2
a
2
b
3
b
4
c
5
c
6
d
6
e
7
f
I would like to be able recursively join then aggregate rows in order to find each disconnected sub-graph represented by these links - that is each collection of IDs that are linked together:
The desired output for the example above would look something like this:
id_1_coll
id_2_coll
1, 2, 3
a, b
4, 5
c
6
d, e
7
f
where each row contains all the other IDs one could reach following the links in the table.
Note that 1 links to b even although there is no explicit link row because we can follow the path 1 --> a --> 2 --> b using the links in the first 3 rows.
One potential approach is to remodel the relationships between id_1 and id_2 such that we get all the links from id_1 to itself then use a recursive common table expression to traverse all the possible paths between id_1 values then aggregate (somewhat arbitrarily) to the lowest such value that can be reached from each id_1.
Explanation
Our steps are
Remodel the relationship into a series of self-joins for id_1
Map each id_1 to the lowest id_1 that it is linked to via a recursive CTE
Aggregate the recursive CTE using the lowest id_1s as the GROUP BY column and grabbing all the linked id_1 and id_2 values via the ARRAY_AGG() function
We can use something like this to remodel the relationships into a self join (1.):
SELECT
a.id_1, a.id_2, b.id_1 AS linked_id
FROM t as a
INNER JOIN t as b
ON a.id_2 = b.id_2
WHERE a.id_1 != b.id_1
Next - to set up the recursive table expression (2.) we can tweak the query above to also give us the lowest (LEAST) of the values for id_1 at each link then use this as the base iteration:
WITH RECURSIVE base_iter AS (
SELECT
a.id_1, b.id_1 AS linked_id, LEAST(a.id_1, b.id_1) AS lowest_linked_id
FROM t as a
INNER JOIN t as b
ON a.id_2 = b.id_2
WHERE a.id_1 != b.id_1
)
We can also grab the lowest id_1 value at this time:
id_1
linked_id
lowest_linked_id
1
2
1
2
1
1
2
3
2
3
2
2
4
5
4
5
4
4
For our recursive loop, we want to maintain an ARRAY of linked ids and join each new iteration such that the id_1 value of the n+1th iteration is equal to the linked_id value of the nth iteration AND the nth linked_id value is not in the array of previously linked ids.
We can code this as follows:
recursive_loop AS (
SELECT id_1, linked_id, lowest_linked_id, [linked_id ] AS linked_ids
FROM base_iter
UNION ALL
SELECT
prev_iter.id_1, prev_iter.linked_id,
iter.lowest_linked_id,
ARRAY_CONCAT(iter.linked_ids, [prev_iter.linked_id])
FROM base_iter AS prev_iter
JOIN recursive_loop AS iter
ON iter.id_1 = prev_iter.linked_id
AND iter.lowest_linked_id < prev_iter.lowest_linked_id
AND prev_iter.linked_id NOT IN UNNEST(iter.linked_ids )
)
Giving us the following results:
|id_1|linked_id|lowest_linked_id|linked_ids|
|----|---------|------------|---|
|3|2|1|[1,2]|
|2|3|1|[1,2,3]|
|4|5|4|[5]|
|1|2|1|[2]|
|5|4|4|[4]|
|2|3|2|[3]|
|2|1|1|[1]|
|3|2|2|[2]|
which we can now link back to the original table for the id_2 values then aggregate (3.) as shown in the complete query below
Solution
WITH RECURSIVE t AS (
SELECT 1 AS id_1, 'a' AS id_2,
UNION ALL SELECT 2, 'a'
UNION ALL SELECT 2, 'b'
UNION ALL SELECT 3, 'b'
UNION ALL SELECT 4, 'c'
UNION ALL SELECT 5, 'c'
UNION ALL SELECT 6, 'd'
UNION ALL SELECT 6, 'e'
UNION ALL SELECT 7, 'f'
),
base_iter AS (
SELECT
a.id_1, b.id_1 AS linked_id, LEAST(a.id_1, b.id_1) AS lowest_linked_id
FROM t as a
INNER JOIN t as b
ON a.id_2 = b.id_2
WHERE a.id_1 != b.id_1
),
recursive_loop AS (
SELECT id_1, linked_id, lowest_linked_id, [linked_id ] AS linked_ids
FROM base_iter
UNION ALL
SELECT
prev_iter.id_1, prev_iter.linked_id,
iter.lowest_linked_id,
ARRAY_CONCAT(iter.linked_ids, [prev_iter.linked_id])
FROM base_iter AS prev_iter
JOIN recursive_loop AS iter
ON iter.id_1 = prev_iter.linked_id
AND iter.lowest_linked_id < prev_iter.lowest_linked_id
AND prev_iter.linked_id NOT IN UNNEST(iter.linked_ids )
),
link_back AS (
SELECT
t.id_1, IFNULL(lowest_linked_id, t.id_1) AS lowest_linked_id, t.id_2
FROM t
LEFT JOIN recursive_loop
ON t.id_1 = recursive_loop.id_1
),
by_id_1 AS (
SELECT
id_1,
MIN(lowest_linked_id) AS grp
FROM link_back
GROUP BY 1
),
by_id_2 AS (
SELECT
id_2,
MIN(lowest_linked_id) AS grp
FROM link_back
GROUP BY 1
),
result AS (
SELECT
by_id_1.grp,
ARRAY_AGG(DISTINCT id_1 ORDER BY id_1) AS id1_coll,
ARRAY_AGG(DISTINCT id_2 ORDER BY id_2) AS id2_coll,
FROM
by_id_1
INNER JOIN by_id_2
ON by_id_1.grp = by_id_2.grp
GROUP BY grp
)
SELECT grp, TO_JSON(id1_coll) AS id1_coll, TO_JSON(id2_coll) AS id2_coll
FROM result ORDER BY grp
Giving us the required output:
grp
id1_coll
id2_coll
1
[1,2,3]
[a,b]
4
[4,5]
[c]
6
[6]
[d,e]
7
[7]
[f]
Limitations/Issues
Unfortunately this approach is inneficient (we have to traverse every single pathway before aggregating it back together) and fails with the real-world case where we have several million join rows. When trying to execute on this data BigQuery runs up a huge "Slot time consumed" then eventually errors out with:
Resources exceeded during query execution: Your project or organization exceeded the maximum disk and memory limit available for shuffle operations. Consider provisioning more slots, reducing query concurrency, or using more efficient logic in this job.
I hope there might be a better way of doing the recursive join such that pathways can be merged/aggregated as we go (if we have an id_1 value AND a linked_id in already in the list of linked_ids we dont need to check it further).
Using ROW_NUMBER() the query is as the follow:
WITH RECURSIVE
t AS (
SELECT 1 AS id_1, 'a' AS id_2,
UNION ALL SELECT 2, 'a'
UNION ALL SELECT 2, 'b'
UNION ALL SELECT 3, 'b'
UNION ALL SELECT 4, 'c'
UNION ALL SELECT 5, 'c'
UNION ALL SELECT 6, 'd'
UNION ALL SELECT 6, 'e'
UNION ALL SELECT 7, 'f'
),
t1 AS (
SELECT ROW_NUMBER() OVER(ORDER BY t.id_1) n, t.id_1, t.id_2 FROM t
),
t2 AS (
SELECT n, [n] n_arr, [id_1] arr_1, [id_2] arr_2, id_1, id_2 FROM t1
WHERE n IN (SELECT MIN(n) FROM t1 GROUP BY id_1)
UNION ALL
SELECT t2.n, ARRAY_CONCAT(t2.n_arr, [t1.n]),
CASE WHEN t1.id_1 NOT IN UNNEST(t2.arr_1)
THEN ARRAY_CONCAT(t2.arr_1, [t1.id_1])
ELSE t2.arr_1 END,
CASE WHEN t1.id_2 NOT IN UNNEST(t2.arr_2)
THEN ARRAY_CONCAT(t2.arr_2, [t1.id_2])
ELSE t2.arr_2 END,
t1.id_1, t1.id_2
FROM t2 JOIN t1 ON
t2.n < t1.n AND
t1.n NOT IN UNNEST(t2.n_arr) AND
(t2.id_1 = t1.id_1 OR t2.id_2 = t1.id_2) AND
(t1.id_1 NOT IN UNNEST(t2.arr_1) OR t1.id_2 NOT IN UNNEST(t2.arr_2))
),
t3 AS (
SELECT
n,
ARRAY_AGG(DISTINCT id_1 ORDER BY id_1) arr_1,
ARRAY_AGG(DISTINCT id_2 ORDER BY id_2) arr_2
FROM t2
WHERE n IN (SELECT MIN(n) FROM t2 GROUP BY id_1)
GROUP BY n
)
SELECT n, TO_JSON(arr_1), TO_JSON(arr_2) FROM t3 ORDER BY n
t1 : Append with row numbers.
t2 : Extract rows matching either id_1 or id_2 by recursive query.
t3 : Make arrays from id_1 and id_2 with ARRAY_AGG().
However, it may not help your Limitations/Issues.
The way this question is phrased makes it appear you want "show me distinct groups from a presorted list, unchained to a previous group". For that, something like this should suffice (assuming auto-incrementing order/one or both id's move to the next value):
SELECT GrpNr,
STRING_AGG(DISTINCT CAST(id_1 as STRING), ',') as id_1_coll,
STRING_AGG(DISTINCT CAST(id_2 as STRING), ',') as id_2_coll
FROM
(
SELECT id_1, id_2,
SUM(CASE WHEN a.id_1 <> a.previous_id_1 and a.id_2 <> a.previous_id_2 THEN 1 ELSE 0 END)
OVER (ORDER BY RowNr) as GrpNr
FROM
(
SELECT *,
ROW_NUMBER() OVER () as RowNr,
LAG(t.id_1, 1) OVER (ORDER BY 1) AS previous_id_1,
LAG(t.id_2, 1) OVER (ORDER BY 1) AS previous_id_2
FROM t
) a
ORDER BY RowNr
) a
GROUP BY GrpNr
ORDER BY GrpNr
I don't think this is the question you mean to ask. This seems to be a graph-walking problem as referenced in the other answers, and in the response from #GordonLinoff to the question here, which I tested (and presume works for BigQuery).
This can also be done using sequential updates as done by #RomanPekar
here (which I also tested). The main consideration seems to be performance. I'd assume dbms have gotten better at recursion since this was posted.
Rolling it up in either case should be fairly easy using String_Agg() as given above or as you have.
I'd be curious to see a more accurate representation of the data. If there is some consistency to how the data is stored/limitations to levels of nesting/other group structures there may be a shortcut approach other than recursion or iterative updates.

Aggregating over an Event table based on time-window periods in configured in another table

I have three tables, UpEvent, DownEvent and AnalysisWindow
UpEvent:
up_event_id | event_date | EventMetric
1 2015-01-01T06:00:00 54
2 2015-01-01T07:30:00 76
DownEvent:
down_event_id | event_date | EventMetric
1 2015-01-01T06:46:00 22
2 2015-01-01T07:33:00 34
AnalysisWindow:
window_id | win_start | win_end
1 2015-01-01T00:00:00 2015-01-01T04:00:00
2 2015-01-01T00:00:00 2015-01-01T08:00:00
.
.
I want to do analysis at each AnalysisWindow in order to aggregate the UpEvent's and DownEvent's that occurred between the defined window.
So for each AnalysisWindow record I would end up with 1 feature row:
WinStart | WinEnd | TotalUpEvents | TotalDownEvents
2015-01-01T00:00:00 2015-01-01T04:00:00 0 0
2015-01-01T00:00:00 2015-01-01T08:00:00 2 2
My first thought was to do something like
select win.win_start,
win.win_end,
count(ue.*),
sum(ue.EventMetric)
from AnalysisWindow win
left join UpEvent ue on (ue.event_date between win.win_start and win.win_end)
Which obviously doesn't work.
Am I approaching this problem incorrectly? I want to do a windowed analysis of the tables at various windows that I configure and get 1 aggregate record per window
Below is for BigQuery Standard SQL (and actually works!)
#standardSQL
WITH ue_win AS (
SELECT
window_id, COUNT(1) TotalUpEvents
FROM `project.dataset.AnalysisWindow` win
CROSS JOIN `project.dataset.UpEvent` ue
WHERE ue.event_date BETWEEN win.win_start AND win.win_end
GROUP BY window_id
), de_win AS (
SELECT
window_id, COUNT(1) TotalDownEvents
FROM `project.dataset.AnalysisWindow` win
CROSS JOIN `project.dataset.DownEvent` de
WHERE de.event_date BETWEEN win.win_start AND win.win_end
GROUP BY window_id
)
SELECT
window_id, win_start, win_end,
IFNULL(TotalUpEvents, 0) TotalUpEvents,
IFNULL(TotalDownEvents, 0) TotalDownEvents
FROM `project.dataset.AnalysisWindow` win
LEFT JOIN ue_win USING(window_id)
LEFT JOIN de_win USING(window_id)
One method uses correlated subqueries:
select aw.*,
(select count(*)
from UpEvent ue
where ue.event_date between aw.win_start and aw.win_end)
) as ups,
(select count(*)
from DownEvent de
where de.event_date between aw.win_start and aw.win_end)
) as downs
from AnalysisWindow aw;
The above works, at least when formulated as:
with UpEvent as (
select 1 as up_event_id, '2015-01-01T06:00:00' as event_date, 54 as EventMetric union all
select 2, '2015-01-01T07:30:00', 76
),
DownEvent as (
select 1 as down_event_id, '2015-01-01T06:46:00' as event_date, 22 as EventMetric union all
select 2, '2015-01-01T07:33:00', 34
),
AnalysisWindow as (
select 1 as window_id , '2015-01-01T00:00:00' as win_start, '2015-01-01T04:00:00' as win_end union all
select 2, '2015-01-01T00:00:00', '2015-01-01T08:00:00'
)
select aw.*,
(select count(*)
from UpEvent ue
where ue.event_date between aw.win_start and aw.win_end
) as ups,
(select count(*)
from DownEvent de
where de.event_date between aw.win_start and aw.win_end
) as downs
from AnalysisWindow aw;
The alternative is to use union all:
ud as (
select event_date, 1 as ups, 0 as downs from upevent
union all
select event_date, 0 as ups, 1 as downs from downevent
)
select aw.window_id, aw.win_start, aw.win_end, sum(ups), sum(downs)
from AnalysisWindow aw join
ud
ON ud.event_date between aw.win_start and aw.win_end
group by aw.window_id, aw.win_start, aw.win_end
union all
select aw.window_id, aw.win_start, aw.win_end, 0, 0
from AnalysisWindow aw
where not exists (select 1 from ud where ud.event_date between aw.win_start and aw.win_end)

How to check missing number sequence with wanted skips

Based on: How to check any missing number from a series of numbers?
I've got a similiar question. My source table has a sequence from 1 to 1000.
But it is only bad if the gap is >1 and <20. I can't get the CONNECT BY to work.
Please help me.
SELECT
'XX' AS NETWORK
,'YY' AS TYPE
,min_seq - 1 + level AS MISSING
FROM (
select
min(s.SEQUENCE_NUMBER) min_seq
, max(s.SEQUENCE_NUMBER) max_seq
FROM source s
)
CONNECT BY level <= max_seq - min_seq +20 AND level >= max_seq - min_seq +1
MINUS
SELECT
'XX' AS NETWORK
,'YY' AS TYPE
,s.SEQUENCE_NUMBER AS EXISTING
FROM source s
Old school connect by version
with tn as(
-- sample data
Select 1 n from dual
union all
Select 4 from dual
union all
Select 26 from dual
union all
Select 30 from dual
union all
Select 52 from dual
)
select distinct n, delta, n+level nn
from (
select n, delta
from (
select n, lead(n) Over(order by n) - n delta
from tn) t
where delta between 2 and 20
) t2
connect by level < delta
order by n
Use a CTE (with statement):
with CTE as
(
select level as NN
from dual
connect by level <= 20
)
select CTE.NN
from CTE
left join source s
on CTE.NN = s.SEQUENCE_NUMBER
where s.SEQUENCE_NUMBER is null

Oracle join using two possible column data

I work with an Oracle 12 database that represents mainframe data. Here is my question.
We have two levels of heirarchy, "System" and "Prin". Imagine them as state and county in the USA. Sometimes, a client will build everything at System level and all of it's children will always referece the System configuration. Other clients built at Prin level, and and child of the prin will first have to look at the PRIN level data for configuration, if prin is not built in the table, then it defaults to the system level config. Pretty easy.
Here's where i can't get the table join to work. A single client can have some systems built at the systems level, and others at the prin level. How can i dynamically join when i am not sure what configuration the client is using in that specific prin?
Example:
WITH tbl as (
select 80 SYSTEM, 0 PRIN, 2 DATA from dual
union
select 80 , 1 , 3 from dual
union
select 80 , 2 , 4 from dual
)
now if i have an item located in system 80 prin 3... it will need the 0 prin data because 0 denotes the "system" config.
so if i have prin 1, i want data "3". if i have prin 2, data "4" , if i have prin 8, i want data "2" because there is no prin 8 config built.
See where I am trying to get?
So when i do
select *
from tbl t
inner join tbl2 tt on t.sys = tt.sys and prin = ?????
how do I say "if prin is built in tbl, use prin, otherwise default to prin = 0"
I know this is a badly stated question. So please ask more specifics and i will try to answer quickly. This is affecting multiple tables.
Pretty ugly, but then so is the data model...
with
tbl ( s, prim, val ) as (
select 80, 0, 2 from dual union all
select 80, 1, 3 from dual union all
select 80, 2, 4 from dual
),
inputs ( s, prim ) as (
select 80, 1 from dual union all
select 80, 5 from dual
)
select t.s, i.prim i_prim, t.prim tbl_prim, t.val
from tbl t join inputs i
on t.s = i.s
and
( t.prim = i.prim
or t.prim = 0
and not exists ( select * from tbl
where s = i.s and prim = i.prim ))
;
S I_PRIM TBL_PRIM VAL
---- ---------- ---------- ----------
80 5 0 2
80 1 1 3
2 rows selected.
I would disadvise you from using a (complex) JOIN with OR-condition on slightly bigger tables (50k+), as the execution plan may totally go nuts from my own experience.
Under such circumstances rather use a Union (select cond1_match) union all (select cond2_default) ordered/ranked and select the first row or use a JOIN like
select coalesce(a1.prin, a2.prin)
from (select cond1_match) a1
full join (select cond2_default) a2
And if understand you right, that you have just one number as input and want to join another datatable, then my suggestion would look like this
with
tbl ( SYSTEM, PRIN, DATA ) as (
select 80, 0, 2 from dual union all
select 80, 1, 3 from dual union all
select 80, 2, 4 from dual
),
tbl2 (SYSTEM, PRIN, OTHERDATA) as (
select 80, 0, 99 from dual union all
select 80, 1, 333 from dual union all
select 80, 2, 444 from dual
)
select t.system, t.prin, t.data, tt.otherdata
from tbl t
inner join tbl2 tt on t.system = tt.system and t.prin = tt.prin
where t.prin = (select nvl(max(prin), 0) from tbl where system = t.system and prin = :pri)
;
system + prin have to be unique or max() would be random
:pri = 5
SYSTEM PRIN DATA OTHERDATA
------ ---- ---- ---------
80 0 2 99
:pri = 2
SYSTEM PRIN DATA OTHERDATA
------ ---- ---- ---------
80 2 4 444
Only guessing about tbl2 and the join condition, but that's basically how I was told to look up data or use a default if no_data_found in SQL

SQL hierarchy count totals report

I'm creating a report with SQL server 2012 and Report Builder which must show the total number of Risks at a high, medium and low level for each Parent Element.
Each Element contains a number of Risks which are rated at a certain level. I need the total for the Parent Elements. The total will include the number of all the Child Elements and also the number the Element itself may have.
I am using CTEs in my query- the code I have attached isn't working (there are no errors - it's just displaying the incorrect results) and I'm not sure that my logic is correct??
Hopefully someone can help. Thanks in advance.
My table structure is:
ElementTable
ElementTableId(PK) ElementName ElementParentId
RiskTable
RiskId(PK) RiskName RiskRating ElementId(FK)
My query:
WITH cte_Hierarchy(ElementId, ElementName, Generation, ParentElementId)
AS (SELECT ElementId,
NAME,
0,
ParentElementId
FROM Extract.Element AS FirtGeneration
WHERE ParentElementId IS NULL
UNION ALL
SELECT NextGeneration.ElementId,
NextGeneration.NAME,
Parent.Generation + 1,
Parent.ElementId
FROM Extract.Element AS NextGeneration
INNER JOIN cte_Hierarchy AS Parent
ON NextGeneration.ParentElementId = Parent.ElementId),
CTE_HighRisk
AS (SELECT r.ElementId,
Count(r.RiskId) AS HighRisk
FROM Extract.Risk r
WHERE r.RiskRating = 'High'
GROUP BY r.ElementId),
CTE_LowRisk
AS (SELECT r.ElementId,
Count(r.RiskId) AS LowRisk
FROM Extract.Risk r
WHERE r.RiskRating = 'Low'
GROUP BY r.ElementId),
CTE_MedRisk
AS (SELECT r.ElementId,
Count(r.RiskId) AS MedRisk
FROM Extract.Risk r
WHERE r.RiskRating = 'Medium'
GROUP BY r.ElementId)
SELECT rd.ElementId,
rd.ElementName,
rd.ParentElementId,
Generation,
HighRisk,
MedRisk,
LowRisk
FROM cte_Hierarchy rd
LEFT OUTER JOIN CTE_HighRisk h
ON rd.ElementId = h.ElementId
LEFT OUTER JOIN CTE_MedRisk m
ON rd.ElementId = m.ElementId
LEFT OUTER JOIN CTE_LowRisk l
ON rd.ElementId = l.ElementId
WHERE Generation = 1
Edit:
Sample Data
ElementTableId(PK) -- ElementName -- ElementParentId
1 ------------------- Main --------------0
2 --------------------Element1-----------1
3 --------------------Element2 ----------1
4 --------------------SubElement1 -------2
RiskId(PK) RiskName RiskRating ElementId(FK)
a -------- Financial -- High ----- 2
b -------- HR --------- High ----- 3
c -------- Marketing -- Low ------- 2
d -------- Safety -----Medium ----- 4
Sample Output:
Element Name High Medium Low
Main ---------- 2 ---- 1 -------1
Here is your sample tables
SELECT * INTO #TABLE1
FROM
(
SELECT 1 ElementTableId, 'Main' ElementName ,0 ElementParentId
UNION ALL
SELECT 2,'Element1',1
UNION ALL
SELECT 3, 'Element2',1
UNION ALL
SELECT 4, 'SubElement1',2
)TAB
SELECT * INTO #TABLE2
FROM
(
SELECT 'a' RiskId, 'Fincancial' RiskName,'High' RiskRating ,2 ElementId
UNION ALL
SELECT 'b','HR','High',3
UNION ALL
SELECT 'c', 'Marketing','Low',2
UNION ALL
SELECT 'd', 'Safety','Medium',4
)TAB
We are finding the children of a parent, its count of High,Medium and Low and use cross join to show parent with all the combinations of its children's High,Medium and Low
UPDATE
The below variable can be used to access the records dynamically.
DECLARE #ElementTableId INT;
--SET #ElementTableId = 1
And use the above variable inside the query
;WITH CTE1 AS
(
SELECT *,0 [LEVEL] FROM #TABLE1 WHERE ElementTableId = #ElementTableId
UNION ALL
SELECT E.*,e2.[LEVEL]+1 FROM #TABLE1 e
INNER JOIN CTE1 e2 on e.ElementParentId = e2.ElementTableId
AND E.ElementTableId<>#ElementTableId
)
,CTE2 AS
(
SELECT E1.*,E2.*,COUNT(RiskRating) OVER(PARTITION BY RiskRating) CNT
from CTE1 E1
LEFT JOIN #TABLE2 E2 ON E1.ElementTableId=E2.ElementId
)
,CTE3 AS
(
SELECT DISTINCT T1.ElementName,C2.RiskRating,C2.CNT
FROM #TABLE1 T1
CROSS JOIN CTE2 C2
WHERE T1.ElementTableId = #ElementTableId
)
SELECT *
FROM CTE3
PIVOT(MIN(CNT)
FOR RiskRating IN ([High], [Medium],[Low])) AS PVTTable
SQL FIDDLE
RESULT
UPDATE 2
I am updating as per your new requirement
Here is sample table in which I have added extra data to test
SELECT * INTO #ElementTable
FROM
(
SELECT 1 ElementTableId, 'Main' ElementName ,0 ElementParentId
UNION ALL
SELECT 2,'Element1',1
UNION ALL
SELECT 3, 'Element2',1
UNION ALL
SELECT 4, 'SubElement1',2
UNION ALL
SELECT 5, 'Main 2',0
UNION ALL
SELECT 6, 'Element21',5
UNION ALL
SELECT 7, 'SubElement21',6
UNION ALL
SELECT 8, 'SubElement22',7
UNION ALL
SELECT 9, 'SubElement23',7
)TAB
SELECT * INTO #RiskTable
FROM
(
SELECT 'a' RiskId, 'Fincancial' RiskName,'High' RiskRating ,2 ElementId
UNION ALL
SELECT 'b','HR','High',3
UNION ALL
SELECT 'c', 'Marketing','Low',2
UNION ALL
SELECT 'd', 'Safety','Medium',4
UNION ALL
SELECT 'e' , 'Fincancial' ,'High' ,5
UNION ALL
SELECT 'f','HR','High',6
UNION ALL
SELECT 'g','HR','High',6
UNION ALL
SELECT 'h', 'Marketing','Low',7
UNION ALL
SELECT 'i', 'Safety','Medium',8
UNION ALL
SELECT 'j', 'Safety','High',8
)TAB
I have written the logic in query
;WITH CTE1 AS
(
-- Here you will find the level of every elements in the table
SELECT *,0 [LEVEL]
FROM #ElementTable WHERE ElementParentId = 0
UNION ALL
SELECT ET.*,CTE1.[LEVEL]+1
FROM #ElementTable ET
INNER JOIN CTE1 on ET.ElementParentId = CTE1.ElementTableId
)
,CTE2 AS
(
-- Filters the level and find the major parant of each child
-- ie, 100->150->200, here the main parent of 200 is 100
SELECT *,CTE1.ElementTableId MajorParentID,CTE1.ElementName MajorParentName
FROM CTE1 WHERE [LEVEL]=1
UNION ALL
SELECT CTE1.*,CTE2.MajorParentID,CTE2.MajorParentName
FROM CTE1
INNER JOIN CTE2 on CTE1.ElementParentId = CTE2.ElementTableId
)
,CTE3 AS
(
-- Since each child have columns for main parent id and name,
-- you will get the count of each element corresponding to the level you have selected directly
SELECT DISTINCT CTE2.MajorParentName,RT.RiskRating ,
COUNT(RiskRating) OVER(PARTITION BY MajorParentID,RiskRating) CNT
FROM CTE2
JOIN #RiskTable RT ON CTE2.ElementTableId=RT.ElementId
)
SELECT MajorParentName, ISNULL([High],0)[High], ISNULL([Medium],0)[Medium],ISNULL([Low],0)[Low]
FROM CTE3
PIVOT(MIN(CNT)
FOR RiskRating IN ([High], [Medium],[Low])) AS PVTTable
SQL FIDDLE