Group parents with same children - sql

EDIT: This is way harder to explain that I though, constatly editing based on comments. Thank you all for taking interest.
I have a table like this
ID Type ParentID
1 ChildTypeA 1
2 ChildTypeB 1
3 ChildTypeC 1
4 ChildTypeD 1
5 ChildTypeA 2
6 ChildTypeB 2
7 ChildTypeC 2
8 ChildTypeA 3
9 ChildTypeB 3
10 ChildTypeC 3
11 ChildTypeD 3
12 ChildTypeA 4
13 ChildTypeB 4
14 ChildTypeC 4
and I want to group parents that have same children - meaning same number of children of same type.
From parent point of view, there is a finite set of possible configurations (max 10).
If any parent has same set of children (by ChildType), I want to group them together (in what I call a configuration).
ChildTypeA-D = ConfigA
ChildTypeA-C = ConfigB
ChildTypeA, B, E, F = ConfigX
etc.
The output I need is parents grouped by Configurations.
Config Group ParentID
ConfigA 1
ConfigA 3
ConfigB 2
ConfigB 4
I have no idea where to even begin.

I named your table t. Please try if this is what you are looking for.
It's show matched and unmatched.
It's looking for parentids with the same number of rows (t1.cnt = t2.cnt) and that all the rows are matched (having COUNT(*) = t1.cnt).
You can try it here
;with t1 as (select parentid, type, id, count(*) over (partition by parentid order by parentid) cnt from t),
t3 as
(
select t1.parentid parentid1, t2.parentid parentid2, count(*) cn, t1.cnt cnt1, t2.cnt cnt2, ROW_NUMBER () over (order by t1.parentid) rn
from t1 join t1 as t2 on t1.type = t2.type and t1.parentid <> t2.parentid and t1.cnt = t2.cnt
group by t1.parentid, t2.parentid, t1.cnt, t2.cnt
having COUNT(*) = t1.cnt
),
notFound as (
select t1.parentid, ROW_NUMBER() over(order by t1.parentid) rn
from t1
where not exists (select 1 from t3 where t1.parentid = t3.parentid1)
group by t1.parentid
)
select 'Config'+char((select min(rn)+64 from t3 as t4 where t3.parentid1 in (t4.parentid1 , t4.parentid2))) config, t3.parentid1
from t3
union all
select 'Config'+char((select max(rn)+64+notFound.rn from t3)) config, notFound.parentid
from notFound
OUTPUT
config parentid1
ConfigA 1
ConfigA 3
ConfigB 2
ConfigB 4
If id 14 was ChildTypeZ then parentid 2 and 4 wouldn't match. This would be the output:
config parentid1
ConfigA 1
ConfigA 3
ConfigC 2
ConfigD 4

I have happen to have similar task. The data I'm working with is a bit bigger scale so I had to find an effective approach to this. Basically I've found 2 working approaches.
One is pure SQL - here's a core query. Basically it gives you smallest ParentID with same collection of children, which you can then use as a group id (you can also enumerate it with row_number). As a small note - I'm using cte here, but in real world I'd suggest to put grouped parents into temporary table and add indexes on the table as well.
;with cte_parents as (
-- You can also use different statistics to narrow the search
select
[ParentID],
count(*) as cnt,
min([Type]) as min_Type,
max([Type]) as max_Type
from Table1
group by
[ParentID]
)
select
h1.ParentID,
k.ParentID as GroupID
from cte_parents as h1
outer apply (
select top 1
h2.[ParentID]
from cte_parents as h2
where
h2.cnt = h1.cnt and
h2.min_Type = h1.min_Type and
h2.max_Type = h1.max_Type and
not exists (
select *
from (select tt.[Type] from Table1 as tt where tt.[ParentID] = h2.[ParentID]) as tt1
full join (select tt.[Type] from Table1 as tt where tt.[ParentID] = h1.[ParentID]) as tt2 on
tt2.[Type] = tt1.[Type]
where
tt1.[Type] is null or tt2.[Type] is null
)
order by
h2.[ParentID]
) as k
ParentID GroupID
----------- --------------
1 1
2 2
3 1
4 2
Another one is a bit trickier and you have to be careful when using it. But surprisingly, it works not so bad. The idea is to concatenate children into big string and then group by these strings. You can use any available concatenation method (xml trick or clr if you have SQL Server 2017). The important part is that you have to use ordered concatenation so every string will represent your group precisely. I have created a special CLR function (dbo.f_ConcatAsc) for this.
;with cte1 as (
select
ParentID,
dbo.f_ConcatAsc([Type], ',') as group_data
from Table1
group by
ParentID
), cte2 as (
select
dbo.f_ConcatAsc(ParentID, ',') as parent_data,
group_data,
row_number() over(order by group_data) as rn
from cte1
group by
group_data
)
select
cast(p.value as int) as ParentID,
c.rn as GroupID,
c.group_data
from cte2 as c
cross apply string_split(c.parent_data, ',') as p
ParentID GroupID group_data
----------- -------------------- --------------------------------------------------
2 1 ChildTypeA,ChildTypeB,ChildTypeC
4 1 ChildTypeA,ChildTypeB,ChildTypeC
1 2 ChildTypeA,ChildTypeB,ChildTypeC,ChildTypeD
3 2 ChildTypeA,ChildTypeB,ChildTypeC,ChildTypeD

Related

How to get one row from table 2 (have blob column) for each row in table 1?

I'm using this query to get the first row from t2 that matches the primary key in t1 but I'm not getting the right results as I cant find the right way to write the query
select t1.id as id , t1.title as title ,t2.fId, t2.image as image ,t2.fName as fName ,t2.fType as fType
from (select data_TBL.ID as id ,data_TBL.INT_STATUS as status,data_TBL.int_TYPE as type , data_TBL.TXT_TITLE as title ,data_TBL.dat_trans_date as cDate from data_TBL
ORDER BY dat_trans_date DESC ) t1,
(select int_data_id as fId ,int_category as category ,txt_attach_type as fType,txt_filename as fName,blob_file as image from data_attach_tbl
where int_category=1 and ROWNUM=1 and data_attach_tbl.int_data_id = id) t2
where
t1.type = 11
AND t1.status >= 1
and ROWNUM <=6
query conditions :
the result should be ordered desc
for each row in t1 fetch one row from t2 (data_attach_tbl.int_data_id = id)
6 rows are the final result
sample
t1 id date vac_title
1 15/10/2018 test 1
2 20/10/2018 test 2
3 21/10/2018 test 3
4 22/10/2018 test 4
5 23/10/2018 test 5
t2 id t1_Id file category
1 2 image 1 1
2 2 image 5 1
3 4 image 10 1
4 4 text file 2
5 4 image 3 1
6 5 image 2 1
result should be
t1_id date vac_title file
5 23/10/2018 test 5 image 2
4 22/10/2018 test 4 image 10
2 20/10/2018 test 2 image 5
ordered result by date and get the first row from t2 that matches and category = 1
the second select statement in from clause cant find the related row that matches id
Thanks
I think something like this should get the result:
SELECT t1.id as id , t1.title as title ,t2.fId, t2.image as image ,t2.fName as fName, t2.fType as fType
FROM (SELECT i1.id , i1.title, MIN(i2.Id) as minID -- t2 primary key field
FROM data_TBL i1
INNER JOIN data_attach_tbl i2 on i1.id = i2.int_data_id -- t1 foreign key field in t2
WHERE i2.int_category = 1
GROUP BY i1.id ) t1
INNER JOIN data_attach_tbl t2 on t1.minID = t2.Id -- t2 primary key field
ORDER BY t1.date desc;
I have ignorred the WHERE clause for t1 for type, status and ROWNUM as you did not mention them, if you need them should be simple to add them back in to the inner query.
Note I have not tested this in Oracle but it is standard SQL so should hopefully be ok.
just found the solution for my problem ,I wish that it would be useful for somebody .
SELECT
t1.title AS title,
t2.int_data_id,
t2.blob_file AS image,
t2.txt_attach_name AS fname,
t2.txt_attach_type AS ftype,
t1.cdate,
t1.id AS aid
FROM
(
SELECT
data_tbl.id AS id,
data_tbl.int_status AS status,
data_tbl.int_type AS type,
data_tbl.txt_title AS title,
data_tbl.dat_trans_date AS cdate
FROM
data_tbl
ORDER BY
dat_trans_date DESC
) t1
INNER JOIN (
SELECT
id,
int_data_id,
int_category,
txt_attach_name,
blob_file,
txt_attach_type,
rownumber
FROM
(
SELECT
id,
int_data_id,
int_category,
txt_attach_name,
txt_attach_type,
blob_file,
ROW_NUMBER() OVER(
PARTITION BY int_data_id
ORDER BY
int_data_id DESC
) AS rownumber
FROM
data_attach_tbl
WHERE
int_category = 1
) d
WHERE
d.rownumber = 1
ORDER BY
int_data_id DESC
) t2 ON t1.id = t2.int_data_id
WHERE t1.type = 11
AND t1.status >= 1;

SQL - Returning unique row based on criteria and a priority

I have a data table that looks in practice like this:
Team Shirt Number Name
1 1 Seaman
1 13 Lucas
2 1 Bosnic
2 14 Schmidt
2 23 Woods
3 13 Tubilandu
3 14 Lev
3 15 Martin
I want to remove duplicates of team by the following logic - if there is a "1" shirt number, use that. If not, look for a 13. If not look for 14 then any.
I realise it is probably quite basic but I don't seem to be making any progress with case statements. I know it's something with sub-queries and case statements but I'm struggling and any help gratefully received!
Using SSMS.
Since you didn't specified any DBMS, let me assume row_number() would work for that :
DELETE
FROM (SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY Team
ORDER BY (CASE WHEN Shirt_Number = 1
THEN 1
WHEN Shirt_Number = 13
THEN 2
WHEN Shirt_Number = 14
THEN 3
ELSE 4
END)
) AS Seq
FROM table t
) t
WHERE Seq = 1;
This assuming Shirt_Numbers have a gap else only order by Shirt_Number enough.
I think you are looking for a partition by clause usage. Solution below worked in Sql Server.
create table #eray
(team int, shirtnumber int, name varchar(200))
insert into #eray values
(1, 1, 'Seaman'),
(1, 13, 'Lucas'),
(2, 1, 'Bosnic'),
(2, 14, 'Schmidt')
;with cte as (
Select Team, ShirtNumber, Name,
ROW_NUMBER() OVER (PARTITION BY Team ORDER BY ShirtNumber ASC) AS rn
From #eray
where ShirtNumber in (1,13,14)
)
select * from cte where rn=1
If you have a table of teams, you can use cross apply:
select ts.*
from teams t cross apply
(select top (1) ts.*
from teamshirts ts
where ts.team = t.team
order by (case shirt_number when 1 then 1 when 13 then 2 when 14 then 3 else 4 end)
) ts;
If you have no numbers between 2 and 12, you can simplify this to:
select ts.*
from teams t cross apply
(select top (1) ts.*
from teamshirts ts
where ts.team = t.team
order by shirt_number
) ts;

Aggregate data from multiple rows into single row

In my table each row has some data columns Priority column (for example, timestamp or just an integer). I want to group my data by ID and then in each group take latest not-null column. For example I have following table:
id A B C Priority
1 NULL 3 4 1
1 5 6 NULL 2
1 8 NULL NULL 3
2 634 346 359 1
2 34 NULL 734 2
Desired result is :
id A B C
1 8 6 4
2 34 346 734
In this example table is small and has only 5 columns, but in real table it will be much larger. I really want this script to work fast. I tried do it myself, but my script works for SQLSERVER2012+ so I deleted it as not applicable.
Numbers: table could have 150k of rows, 20 columns, 20-80k of unique ids and average SELECT COUNT(id) FROM T GROUP BY ID is 2..5
Now I have a working code (thanks to #ypercubeᵀᴹ), but it runs very slowly on big tables, in my case script can take one minute or even more (with indices and so on).
How can it be speeded up?
SELECT
d.id,
d1.A,
d2.B,
d3.C
FROM
( SELECT id
FROM T
GROUP BY id
) AS d
OUTER APPLY
( SELECT TOP (1) A
FROM T
WHERE id = d.id
AND A IS NOT NULL
ORDER BY priority DESC
) AS d1
OUTER APPLY
( SELECT TOP (1) B
FROM T
WHERE id = d.id
AND B IS NOT NULL
ORDER BY priority DESC
) AS d2
OUTER APPLY
( SELECT TOP (1) C
FROM T
WHERE id = d.id
AND C IS NOT NULL
ORDER BY priority DESC
) AS d3 ;
In my test database with real amount of data I get following execution plan:
This should do the trick, everything raised to the power 0 will return 1 except null:
DECLARE #t table(id int,A int,B int,C int,Priority int)
INSERT #t
VALUES (1,NULL,3 ,4 ,1),
(1,5 ,6 ,NULL,2),(1,8 ,NULL,NULL,3),
(2,634 ,346 ,359 ,1),(2,34 ,NULL,734 ,2)
;WITH CTE as
(
SELECT id,
CASE WHEN row_number() over
(partition by id order by Priority*power(A,0) desc) = 1 THEN A END A,
CASE WHEN row_number() over
(partition by id order by Priority*power(B,0) desc) = 1 THEN B END B,
CASE WHEN row_number() over
(partition by id order by Priority*power(C,0) desc) = 1 THEN C END C
FROM #t
)
SELECT id, max(a) a, max(b) b, max(c) c
FROM CTE
GROUP BY id
Result:
id a b c
1 8 6 4
2 34 346 734
One alternative that might be faster is a multiple join approach. Get the priority for each column and then join back to the original table. For the first part:
select id,
max(case when a is not null then priority end) as pa,
max(case when b is not null then priority end) as pb,
max(case when c is not null then priority end) as pc
from t
group by id;
Then join back to this table:
with pabc as (
select id,
max(case when a is not null then priority end) as pa,
max(case when b is not null then priority end) as pb,
max(case when c is not null then priority end) as pc
from t
group by id
)
select pabc.id, ta.a, tb.b, tc.c
from pabc left join
t ta
on pabc.id = ta.id and pabc.pa = ta.priority left join
t tb
on pabc.id = tb.id and pabc.pb = tb.priority left join
t tc
on pabc.id = tc.id and pabc.pc = tc.priority ;
This can also take advantage of an index on t(id, priority).
previous code will work with following syntax:
with pabc as (
select id,
max(case when a is not null then priority end) as pa,
max(case when b is not null then priority end) as pb,
max(case when c is not null then priority end) as pc
from t
group by id
)
select pabc.Id,ta.a, tb.b, tc.c
from pabc
left join t ta on pabc.id = ta.id and pabc.pa = ta.priority
left join t tb on pabc.id = tb.id and pabc.pb = tb.priority
left join t tc on pabc.id = tc.id and pabc.pc = tc.priority ;
This looks rather strange. You have a log table for all column changes, but no associated table with current data. Now you are looking for a query to collect your current values from the log table, which is a laborious task naturally.
The solution is simple: have an additional table with the current data. You can even link the tables with a trigger (so either every time a record gets inserted in your log table you update the current table or everytime a change is written to the current table you write a log entry).
Then just query your current table:
select id, a, b, c from currenttable order by id;

How to get second parent with recursive query in Common Table

I am using SQL Server 2008. I have a table like this:
UnitId ParentId UnitName
---------------------------
1 0 FirstUnit
2 1 SecondUnit One
3 1 SecondUnit Two
4 3 B
5 2 C
6 4 D
7 6 E
8 5 F
I want to get second parent of the record. For example:
If I choose unit id that equal to 8, It will bring unit id is equal to 2 to me. It needs to be SecondUnit One. or If I choose unit id that equal to 7, It will bring unit id is equal to 3 to me. It needs to be SecondUnit Two.
How can I write a SQL query this way?
It took me a while, but here it is :)
with tmp as (
select unitId, parentId, unitName, 0 as iteration
from t
where unitId = 7
union all
select parent.unitId, parent.parentId, parent.unitName, child.iteration + 1
from tmp child
join t parent on child.parentId = parent.unitId
where parent.parentId != 0
)
select top 1 unitId, parentId, unitName from tmp
order by iteration desc
Here is also a fiddle to play with.
SELECT t.*, tParent1.UnitId [FirstParent], tParent2.UnitId [SecondParent]
FROM Table t
LEFT JOIN Table tParent1 ON t.ParentId = tParent1.UnitId
LEFT JOIN Table tParent2 ON tParent1.ParentId = tParent2.UnitId
WHERE t.UnitId = <Unit ID search here>
AND NOT tParent2.UnitId IS NULL
Edit: And leave out second part of the WHERE clause if you want results returned even if they don't have a second parent.

Consolidate records

I want to consolidate a set of records
(id) / (referencedid)
1 10
1 11
2 11
2 10
3 10
3 11
3 12
The result of query should be
1 10
1 11
3 10
3 11
3 12
So, since id=1 and id=2 has same set of corresponding referenceids {10,11} they would be consolidated. But id=3 s corresponding referenceids are not the same, hence wouldnt be consolidated.
What would be good way to get this done?
Select id, referenceid
From MyTable
Where Id In (
Select Min( Z.Id ) As Id
From (
Select Z1.id, Group_Concat( Z1.referenceid ) As signature
From (
Select id, referenceid
From MyTable
Order By id, referenceid
) As Z1
Group By Z1.id
) As Z
Group By Z.Signature
)
-- generate count of elements for each distinct id
with Counts as (
select
id,
count(1) as ReferenceCount
from
tblReferences R
group by
R.id
)
-- generate every pairing of two different id's, along with
-- their counts, and how many are equivalent between the two
,Pairings as (
select
R1.id as id1
,R2.id as id2
,C1.ReferenceCount as count1
,C2.ReferenceCount as count2
,sum(case when R1.referenceid = R2.referenceid then 1 else 0 end) as samecount
from
tblReferences R1 join Counts C1 on R1.id = C1.id
cross join
tblReferences R2 join Counts C2 on R2.id = C2.id
where
R1.id < R2.id
group by
R1.id, C1.ReferenceCount, R2.id, C2.ReferenceCount
)
-- generate the list of ids that are safe to remove by picking
-- out any id's that have the same number of matches, and same
-- size of list, which means their reference lists are identical.
-- since id2 > id, we can safely remove id2 as a copy of id, and
-- the smallest id of which all id2 > id are copies will be left
,RemovableIds as (
select
distinct id2 as id
from
Pairings P
where
P.count1 = P.count2 and P.count1 = P.samecount
)
-- validate the results by just selecting to see which id's
-- will be removed. can also include id in the query above
-- to see which id was identified as the copy
select id from RemovableIds R
-- comment out `select` above and uncomment `delete` below to
-- remove the records after verifying they are correct!
--delete from tblReferences where id in (select id from RemovableIds) R