Related
I have a many-many relationship table, and I want to find the overlapped groups and merge them into one.
In the example below, user 2 is in groups 7 and 8, so groups 7 and 8 should be merged into one that contains the records 1, 2, 4. The merged group id can be either 7 or 8, it doesn't matter.
user_id
group
1
7
2
7
2
8
4
8
5
9
6
9
I wish to see output like this:
user_id
group
1
7
2
7
4
7
5
9
6
9
Answering my own question here, below is the SQL I built that fits my needs. This is inspired by #pankaj 's answer.
with data(user_id,group_id) as (
select * from values
(1,7),(2,7),(2,8),(4,9),(5,9),(5,8),
(6,9),(70,8),(21,51),(22,51),(23,52),
(24,51),(24,52),(25,26)
), group_members as (
select
group_id, array_agg(user_id) users
from data
group by group_id
), overlapped_group as (
select
c1.group_id g1,
c2.group_id g2,
-- c1.users,
-- c2.users,
least(g1, coalesce(g2, g1)) as min_group,
min(min_group) over (partition by g2) as merge_to
from group_members c1
left join
group_members c2 on arrays_overlap(c1.users, c2.users)
and g1 <> g2
), merge_mapping as (
select distinct
g1 as group_id,
iff(g2 is null, g1, min(merge_to) over (partition by g1)) as merge_to
from overlapped_group
)
select
user_id,
m.merge_to as group_id
from data
left join merge_mapping m using(group_id);
This is similar to the one asked earlier, where-in grouping needs to be done to the top level in hierarchy.
The below query aggregates user_id based on group_id into array and then compares those arrays with each other.
When two arrays match they both get same group id.
Once arrays match and they have been assigned their parent group id based on minimum group value, we need to get the top of the hierarchy.
There could also be multiple hierarchies in the data-set, so we set starting point of each hierarchy as NULL.
Lastly, we use hierarchical query to get the final grouping.
with data(user_id,group_id) as (
select * from values
(1,7),(2,7),(2,8),(4,9),(5,9),(5,8),
(6,9),(70,8),(21,51),(22,51),(23,52),
(24,51),(24,52),(25,26)
),cte_1 as
(select group_id,array_agg(user_id) arr
from data
group by group_id
), cte_2 as
(select c1.group_id g1, c2.group_id g2 ,
c1.arr arr1, c2.arr arr2,
case when arrays_overlap(arr1, arr2) then g1 end flag,
min(flag) over (partition by g2) grp,
case when g2 <> grp then grp end final_grp
from cte_1 c1, cte_1 c2
), cte_3 as
(select distinct g2, connect_by_root g2 as parent from cte_2
start with final_grp is null
connect by final_grp = prior g2
order by g2
), cte_4 as
(select c3.parent, c1.arr
from cte_1 c1 left join cte_3 c3
where c1.group_id = c3.g2
) select distinct value, parent as final_group
from cte_4,
lateral flatten(input=>arr)
order by value;
VALUE
FINAL_GROUP
1
7
2
7
4
7
5
7
6
7
21
51
22
51
23
51
24
51
25
26
70
7
Adding another query, that is simpler.
with data(user_id,group_id) as (
select * from values
(1,7),(2,7),(2,8),(4,9),(5,9),(5,8),
(6,9),(70,8),(21,51),(22,51),(22,52),
(22,53),(23,52),(25,26)
), cte_1 as
(select a.group_id grp1, b.group_id grp2
from data a, data b
where a.user_id = b.user_id
and a.group_id < b.group_id
), cte_2 as
(select grp2, connect_by_root grp1 as parent
from cte_1
start with grp1 not in (select grp2 from cte_1)
connect by grp1 = prior grp2
) select a.user_id,
coalesce(b.parent, a.group_id) final_grp
from data a left join cte_2 b
on a.group_id = b.grp2;
One way:
select user_id, STRTOK(listagg(group, ', ') within group (ORDER BY user_id ),',',1)
from <table>
GROUP BY user_id ORDER BY user_id;
I have 2 tables Persons and Sales. In Person there is relation between child and parent, I want to compute 20 percent of parent values with the following condition
Persons
Id | ParentId | Name
1 NULL Tom
2 1 Jake
3 2 Kate
4 3 Neil
Sales
PersonId | Sale
4 500
I want to get result like this
Id | ParentId | Name | Sale
1 Null Tom 100 <-- (500*20)/100 left 400
2 1 Jake 80 <-- (400*20)/100 left 320
3 2 Kate 64 <-- (320*20)/100 left 256
4 3 Neil 256 <-- (320*80)/100
I wrote this query but it does not give appropriate result
;WITH cte_persons
AS
(
SELECT p.Id, p.ParentId, p.Name, s.Price FROM Persons AS p
INNER JOIN Sales AS s ON s.PersonId = p.Id
UNION ALL
SELECT p.Id, p.ParentId, p.Name, CAST((c.Price - (c.Price*80)/100) AS DECIMAL(6, 2)) FROM #Persons AS p
INNER JOIN cte_persons AS c ON c.ParentId = p.Id
)
SELECT * FROM cte_persons
This should be a two steps algorithm. First traverse the hierachy to get max level. Then apply the level in a reverse order.
WITH cte_persons
AS
(
SELECT 1 as level, p.Id, p.ParentId, p.Name, s.Price, p.Id AS base
FROM Persons AS p
INNER JOIN Sales AS s ON s.PersonId = p.Id
UNION ALL
SELECT level + 1, p.Id, p.ParentId, p.Name, c.Price, c.base
FROM Persons AS p
INNER JOIN cte_persons AS c ON c.ParentId = p.Id
)
SELECT Id, ParentId, Name,
CASE level WHEN 1
THEN price - sum(delta) over(partition by base order by level desc) + delta
ELSE delta END sale
FROM (
SELECT *,
(power (0.8000, max(level) over(partition by base) - level) * 0.2) * price delta
FROM cte_persons
) t
ORDER BY id;
db<>fiddle
I have two tables, for instance "Employees" and "Projects". I need a list of all projects with the name of all the employes involved.
Problem now is, that the employe_ids are saved with commas, like in the example below:
employe | ID project | employe_id
-------------- -----------------------
Person A | 1 Project X | ,2,
Person B | 2 Project Y |
Person C | 3 Project Z | ,1,3,
select
p.project, e.employe
from
projects p
left join employees e on e.id = p.employe_id ???
How do I have to write the join to get the desired output:
project | employe
--------------------
Project X | Person B,
Project Y |
Project Z | Person A, Person C
You can try group by and listagg as following:
select
p.project,
listagg(e.employe,',') within group (order by e.id) as employee
from
projects p
left join employees e on p.employe_id like '%,' || e.id || ',%'
Group by p.project
Cheers!!
Here's a kind of fun way to do it:
WITH cteId_counts_by_project AS (SELECT PROJECT,
NVL(REGEXP_COUNT(EMPLOYE_ID, '[^,]'), 0) AS ID_COUNT
FROM PROJECTS),
cteMax_id_count AS (SELECT MAX(ID_COUNT) AS MAX_ID_COUNT
FROM cteId_counts_by_project),
cteProject_employee_ids AS (SELECT PROJECT,
EMPLOYE_ID,
REGEXP_SUBSTR(EMPLOYE_ID, '[^,]',1, LEVEL) AS ID
FROM PROJECTS
CROSS JOIN cteMax_id_count m
CONNECT BY LEVEL <= m.MAX_ID_COUNT),
cteProject_emps AS (SELECT DISTINCT PROJECT, ID
FROM cteProject_employee_ids
WHERE ID IS NOT NULL),
cteProject_empnames AS (SELECT pe.PROJECT, pe.ID, e.EMPLOYE
FROM cteProject_emps pe
LEFT OUTER JOIN EMPLOYEES e
ON e.ID = pe.ID
ORDER BY pe.PROJECT, e.EMPLOYE)
SELECT p.PROJECT,
LISTAGG(pe.EMPLOYE, ',') WITHIN GROUP (ORDER BY pe.EMPLOYE) AS EMPLOYEE_LIST
FROM PROJECTS p
LEFT OUTER JOIN cteProject_empnames pe
ON pe.PROJECT = p.PROJECT
GROUP BY p.PROJECT
ORDER BY p.PROJECT
You can certainly compress some of the CTE's together to save space, but I kept them separate so you can see how each little bit adds to the solution.
dbfiddle here
Yet another option; code you need (as you already have those tables) begins at line #12:
SQL> -- Your sample data
SQL> with employees (employee, id) as
2 (select 'Person A', 1 from dual union all
3 select 'Person B', 2 from dual union all
4 select 'Person C', 3 from dual
5 ),
6 projects (project, employee_id) as
7 (select 'Project X', ',2,' from dual union all
8 select 'Project Y', null from dual union all
9 select 'Project Z', ',1,3,' from dual
10 ),
11 -- Employees per project
12 emperpro as
13 (select project, regexp_substr(employee_id, '[^,]+', 1, column_value) id
14 from projects cross join table(cast(multiset(select level from dual
15 connect by level <= regexp_count(employee_id, ',') + 1
16 ) as sys.odcinumberlist))
17 )
18 -- Final result
19 select p.project, listagg(e.employee, ', ') within group (order by null) employee
20 from emperpro p left join employees e on e.id = p.id
21 group by p.project
22 /
PROJECT EMPLOYEE
--------- ----------------------------------------
Project X Person B
Project Y
Project Z Person A, Person C
SQL>
You can extract the numeric IDs from the employee_id column of the projects table by using regexp_substr() and rtrim() ( trimming the last extra comma ) functions together, and then concatenate by listagg() function :
with p2 as
(
select distinct p.*, regexp_substr(rtrim(p.employee_id,','),'[^,]',1,level) as p_eid,
level as rn
from projects p
connect by level <= regexp_count(rtrim(p.employee_id,','),',')
)
select p2.project, listagg(e.employee,', ') within group (order by p2.rn) as employee
from p2
left join employees e on e.id = p2.p_eid
group by p2.project
Demo
I have a table containing details on my company's chart of accounts - this data is essentially stored in nested sets (on SQL Server 2014), with each record having a left and right anchor - there are no Parent IDs.
Sample Data:
ID LeftAnchor RightAnchor Name
1 0 25 Root
2 1 16 Group 1
3 2 9 Group 1.1
4 3 4 Account 1
5 5 6 Account 2
6 7 8 Account 3
7 10 15 Group 1.2
8 11 12 Account 4
9 13 14 Account 5
10 17 24 Group 2
11 18 23 Group 2.1
12 19 20 Account 1
13 21 22 Account 1
I need to materialize the path for each record, so that my output looks like this:
ID LeftAnchor RightAnchor Name MaterializedPath
1 0 25 Root Root
2 1 16 Group 1 Root > Group 1
3 2 9 Group 1.1 Root > Group 1 > Group 1.1
4 3 4 Account 1 Root > Group 1 > Group 1.1 > Account 1
5 5 6 Account 2 Root > Group 1 > Group 1.1 > Account 2
6 7 8 Account 3 Root > Group 1 > Group 1.1 > Account 3
7 10 15 Group 1.2 Root > Group 1 > Group 1.2
8 11 12 Account 4 Root > Group 1 > Group 1.2 > Acount 4
9 13 14 Account 5 Root > Group 1 > Group 1.2 > Account 5
10 17 24 Group 2 Root > Group 2
11 18 23 Group 2.1 Root > Group 2 > Group 2.1
12 19 20 Account 1 Root > Group 2 > Group 2.1 > Account 10
13 21 22 Account 1 Root > Group 2 > Group 2.1 > Account 11
Whilst I've managed to achieve this using CTEs, the query is deathly slow. It takes just shy of two minutes to run with around 1200 records in the output.
Here's a simplified version of my code:
;with accounts as
(
-- Chart of Accounts
select AccountId, LeftAnchor, RightAnchor, Name
from ChartOfAccounts
-- dirty great where clause snipped
)
, parents as
(
-- Work out the Parent Nodes
select c.AccountId, p.AccountId [ParentId]
from accounts c
left join accounts p on (p.LeftAnchor = (
select max(i.LeftAnchor)
from accounts i
where i.LeftAnchor<c.LeftAnchor
and i.RightAnchor>c.RightAnchor
))
)
, path as
(
-- Calculate the Account path for each node
-- Root Node
select c.AccountId, c.LeftAnchor, c.RightAnchor, 0 [Level], convert(varchar(max), c.name) [MaterializedPath]
from accounts c
where c.LeftAnchor = (select min(LeftAnchor) from chart)
union all
-- Children
select n.AccountId, n.LeftAnchor, n.RightAnchor, p.level+1, p.path + ' > ' + n.name
from accounts n
inner join parents x on (n.AccountId=x.AccountId)
inner join path p on (x.ParentId=p.AccountId)
)
select * from path order by LeftAnchor
Ideally this query should only take a couple of seconds (max) to run. I can't make any changes to the database itself (read-only connection), so can anyone come up with a better way to write this query?
After your comments, I realized no need for the CTE... you already have the range keys.
Example
Select A.*
,Path = Replace(Path,'>','>')
From YourTable A
Cross Apply (
Select Path = Stuff((Select ' > ' +Name
From (
Select LeftAnchor,Name
From YourTable
Where A.LeftAnchor between LeftAnchor and RightAnchor
) B1
Order By LeftAnchor
For XML Path (''))
,1,6,'')
) B
Order By LeftAnchor
Returns
First you can try to rearrange your preparing CTEs (accounts and parents) to have it that each CTE contains all data from previous, so you only use the last one in path CTE - no need for multiple joins:
;with accounts as
(
-- Chart of Accounts
select AccountId, LeftAnchor, RightAnchor, Name
from ChartOfAccounts
-- dirty great where clause snipped
)
, parents as
(
-- Work out the Parent Nodes
select c.*, p.AccountId [ParentId]
from accounts c
left join accounts p on (p.LeftAnchor = (
select max(i.LeftAnchor)
from accounts i
where i.LeftAnchor<c.LeftAnchor
and i.RightAnchor>c.RightAnchor
))
)
, path as
(
-- Calculate the Account path for each node
-- Root Node
select c.AccountId, c.LeftAnchor, c.RightAnchor, 0 [Level], convert(varchar(max), c.name) [MaterializedPath]
from parents c
where c.ParentID IS NULL
union all
-- Children
select n.AccountId, n.LeftAnchor, n.RightAnchor, p.level+1, p.[MaterializedPath] + ' > ' + n.name
from parents n
inner join path p on (n.ParentId=p.AccountId)
)
select * from path order by LeftAnchor
This should give some improvement (50% in my test), but to have it really better, you can split first half of preparing data into #temp table, put clustered index on ParentID column in #temp table and use it in second part
if (Object_ID('tempdb..#tmp') IS NOT NULL) DROP TABLE #tmp;
with accounts as
(
-- Chart of Accounts
select AccountId, LeftAnchor, RightAnchor, Name
from ChartOfAccounts
-- dirty great where clause snipped
)
, parents as
(
-- Work out the Parent Nodes
select c.*, p.AccountId [ParentId]
from accounts c
left join accounts p on (p.LeftAnchor = (
select max(i.LeftAnchor)
from accounts i
where i.LeftAnchor<c.LeftAnchor
and i.RightAnchor>c.RightAnchor
))
)
select * into #tmp
from parents;
CREATE CLUSTERED INDEX IX_tmp1 ON #tmp (ParentID);
With path as
(
-- Calculate the Account path for each node
-- Root Node
select c.AccountId, c.LeftAnchor, c.RightAnchor, 0 [Level], convert(varchar(max), c.name) [MaterializedPath]
from #tmp c
where c.ParentID IS NULL
union all
-- Children
select n.AccountId, n.LeftAnchor, n.RightAnchor, p.level+1, p.[MaterializedPath] + ' > ' + n.name
from #tmp n
inner join path p on (n.ParentId=p.AccountId)
)
select * from path order by LeftAnchor
Hard to tell on small sample data, but it should be an improvement. Please tell if you try it.
Seems odd to me that you don't have a Parent ID, but with the aid of an initial OUTER APPLY, we can generate a Parent ID and then run a standard recursive CTE.
Example
Declare #Top int = null --<< Sets top of Hier Try 12 (Just for Fun)
;with cte0 as (
Select A.*
,B.*
From YourTable A
Outer Apply (
Select Top 1 Pt=ID
From YourTable
Where A.LeftAnchor between LeftAnchor and RightAnchor and LeftAnchor<A.LeftAnchor
Order By LeftAnchor Desc
) B
)
,cteP as (
Select ID
,Pt
,LeftAnchor
,RightAnchor
,Lvl=1
,Name
,Path = cast(Name as varchar(max))
From cte0
Where IsNull(#Top,-1) = case when #Top is null then isnull(Pt ,-1) else ID end
Union All
Select r.ID
,r.Pt
,r.LeftAnchor
,r.RightAnchor
,p.Lvl+1
,r.Name
,cast(p.path + ' > '+r.Name as varchar(max))
From cte0 r
Join cteP p on r.Pt = p.ID
)
Select *
From cteP
Order By LeftAnchor
Returns
I have parent child relation SQL table
LOCATIONDETAIL Table
OID NAME PARENTOID
1 HeadSite 0
2 Subsite1 1
3 subsite2 1
4 subsubsite1 2
5 subsubsite2 2
6 subsubsite3 3
RULESETCONFIG
OID LOCATIONDETAILOID VALUE
1 1 30
2 4 15
If i provide Input as LOCATIONDETAIL 6, i should get RULESETCONFIG value as 30
because for
LOCATIONDETAIL 6, parentid is 3 and for LOCATIONDETAIL 3 there is no value in RULESETCONFIG,
LOCATIONDETAIL 3 has parent 1 which has value in RULESETCONFIG
if i provide Input as LOCATIONDETAIL 4, i should get RULESETCONFIG value 15
i have code to populate the tree, but don't know how to find the next available Parent
;WITH GLOBALHIERARCHY AS
(
SELECT A.OID,A.PARENTOID,A.NAME
FROM LOCATIONDETAIL A
WHERE OID = #LOCATIONDETAILOID
UNION ALL
SELECT A.OID,A.PARENTOID,A.NAME
FROM LOCATIONDETAIL A INNER JOIN GLOBALHIERARCHY GH ON A.PARENTOID = GH.OID
)
SELECT * FROM GLOBALHIERARCHY
This will return the next parent with a value. If you want to see all, remove the top 1 from the final select.
dbFiddle
Example
Declare #Fetch int = 4
;with cteHB as (
Select OID
,PARENTOID
,Lvl=1
,NAME
From LOCATIONDETAIL
Where OID=#Fetch
Union All
Select R.OID
,R.PARENTOID
,P.Lvl+1
,R.NAME
From LOCATIONDETAIL R
Join cteHB P on P.PARENTOID = R.OID)
Select Top 1
Lvl = Row_Number() over (Order By A.Lvl Desc )
,A.OID
,A.PARENTOID
,A.NAME
,B.Value
From cteHB A
Left Join RULESETCONFIG B on A.OID=B.OID
Where B.VALUE is not null
and A.OID <> #Fetch
Order By 1 Desc
Returns when #Fetch=4
Lvl OID PARENTOID NAME Value
2 2 1 Subsite1 15
Returns when #Fetch=6
Lvl OID PARENTOID NAME Value
1 1 0 HeadSite 30
This should do the job:
;with LV as (
select OID ID,PARENTOID PID,NAME NAM, VALUE VAL FROM LOCATIONDETAIL
left join RULESETCONFIG ON LOCATIONDETAILOID=OID
), GH as (
select ID gID,PID gPID,NAM gNAM,VAL gVAL from LV where ID=#OID
union all
select ID,PID,NAM,VAL FROM LV INNER JOIN GH ON gVAL is NULL AND gPID=ID
)
select * from GH WHERE gVAL>0
See here for e little demo: http://rextester.com/OXD40496