Using Subqueries to remove duplicate IDs [closed]

Using Subqueries to remove duplicate IDs [closed] - sql

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 months ago.
Improve this question
I have 2 Tables.
Table 1 holds ID1 and ID2.
Table 2 holds ID2 and ID3.
Table 1 has unique cases for ID1 and multiple cases for ID2.
TABLE 1:
ID1 | ID2
1 1
2 2
3 3
4 3
5 4
6 5
7 5
8 6
9 7
10 6
Table 2 has unique cases for ID2 and multiple cases for ID3
TABLE 2:
ID2 | ID3
1 1
2 1
3 2
4 3
5 2
6 4
7 5
I want 1 unique case of ID3.
I need remove duplicate ID2s from Table 1 picking to remove the duplicate ID2s based on the smaller ID1
So Table 1 now looks like:
TABLE 1:
ID1 | ID2
1 1
2 2
4 3
5 4
7 5
9 7
10 6
Now I want to go to Table 2 and remove any duplicate ID3s based on the smaller ID2
TABLE 2:
ID2 | ID3
2 1
4 3
5 2
6 4
7 5
So my end result should be (I am joining the tables because both of them have other relevant information I need to combine but these are the IDs I am sorting and filtering to get the correct row):
Final Table:
ID1 | ID2 | ID3
2 2 1
7 5 2
5 4 3
10 6 4
9 7 5
Where now I have a single case for each ID3 based on the largest ID1 and ID2 associated with that ID3.
I have tried creating subqueries in the WHERE function to remove the duplicates but my understanding of SQL is not good enough to really figure out what is happening.
Group By and DISTINCT does not work for this case.
Decision Tree
I added a Decision Tree to help visualize the problem. Essentially, each ID3 can potentially have multiple ID2s, which can potentially have multiple ID1s.
I want to keep only the largest ID1, which gives me the correct ID2 associated with that ID3.

with t1 as (
select ID1, ID2
from
(
select *
,row_number() over(partition by ID2 order by ID1 desc) as rn
from t
) t
where rn = 1
),
t3 as (
select ID2, ID3
from
(
select *
,row_number() over(partition by ID3 order by ID2 desc) as rn
from t2
) t
where rn = 1
)
select t1.ID1
,t1.ID2
,t3.ID3
from t1 join t3 on t3.ID2 = t1.ID2
order by ID3
ID1
ID2
ID3
2
2
1
7
5
2
5
4
3
10
6
4
9
7
5
Fiddle

Related

MSAccess - query to return result set of earliest rows with a unique combination of 2 columns

I have a table with the following columns.
ID (auto-inc)
When (datetime)
id1 (number)
id2 (number)
The combination of id1 and id2 can be unique or duplicated many times.
I need a query that returns the earliest record (by When) for each unique combination of id1+id2.
Example data:
ID
When
id1
id2
1
1-Jan-2020
4
5
2
1-Jan-2019
4
5
3
1-Jan-2021
4
5
4
1-Jan-2020
4
4
5
1-Jan-2019
4
4
6
1-Jan-2021
4
6
I need this to return rows 2, 5 and 6
I cannot figure out how to do this with an SQL query.
I have tried Group By on the concatenation of id1 & id2, and I have tried "Distinct id1, id2", but neither return the entire row of the record with the earliest When value.
If the result set can just return the ID that is fine also, I just need to know the rows that match these two requirements.

Okay, I had a few minutes to kill:
SELECT Data.* FROM Data WHERE ID IN (
SELECT TOP 1 ID FROM Data AS D
WHERE D.id1=Data.id1 AND D.id2=Data.id2 ORDER BY When);
or
SELECT Data.* FROM Data INNER JOIN (
SELECT id1, id2, Min(When) AS MW FROM Data
GROUP BY id1, id2) AS D
ON Data.When = D.MW AND Data.id1=D.id1 AND Data.id2=D.id2;
ID
When
id1
id2
2
1/1/2019
4
5
5
1/1/2019
4
4
6
1/1/2021
4
6

How To Get Top N Rows per Each Group - MS Access [duplicate]

I have a table with the following columns.
ID (auto-inc)
When (datetime)
id1 (number)
id2 (number)
The combination of id1 and id2 can be unique or duplicated many times.
I need a query that returns the earliest record (by When) for each unique combination of id1+id2.
Example data:
ID
When
id1
id2
1
1-Jan-2020
4
5
2
1-Jan-2019
4
5
3
1-Jan-2021
4
5
4
1-Jan-2020
4
4
5
1-Jan-2019
4
4
6
1-Jan-2021
4
6
I need this to return rows 2, 5 and 6
I cannot figure out how to do this with an SQL query.
I have tried Group By on the concatenation of id1 & id2, and I have tried "Distinct id1, id2", but neither return the entire row of the record with the earliest When value.
If the result set can just return the ID that is fine also, I just need to know the rows that match these two requirements.

Okay, I had a few minutes to kill:
SELECT Data.* FROM Data WHERE ID IN (
SELECT TOP 1 ID FROM Data AS D
WHERE D.id1=Data.id1 AND D.id2=Data.id2 ORDER BY When);
or
SELECT Data.* FROM Data INNER JOIN (
SELECT id1, id2, Min(When) AS MW FROM Data
GROUP BY id1, id2) AS D
ON Data.When = D.MW AND Data.id1=D.id1 AND Data.id2=D.id2;
ID
When
id1
id2
2
1/1/2019
4
5
5
1/1/2019
4
4
6
1/1/2021
4
6

Excluding rows based on column

I am trying to exclude rows where a value exists in another column of other row.
select * from TABLE1
ID1 ID2 VALUE
1 1 HIGH
2 2 MEDIUM
3 3 LOW
4 4 HIGH
5 4 HIGH
6 6 MEDIUM
All the data is coming from the same table what I want is to exclude ID1 = 4 because the value 4 exists in column ID2 in row 5. The final desired result is as follows:
ID1 ID2 VALUE
1 1 HIGH
2 2 MEDIUM
3 3 LOW
6 6 MEDIUM
I tried using a simple query such as:
Select * from TABLE1 Where ID1 = ID2
But this will wrongly also include row 4 as below since I need to exclude it because the value exists in another row but in ID2 column:
ID1 ID2 VALUE
1 1 HIGH
2 2 MEDIUM
3 3 LOW
4 4 HIGH
6 6 MEDIUM

You just have to add, this will exclude the records where you see more than 1 ids.
and id2 not in (Select id2 from table1 group by id2 having count(*) > 1)
Similarly add for id1 with OR

You can use the logic in the query below.
select * from t T1
Where 2 > (Select count(1) from t T2 where T2.id2 = T1.id2);

Merging Multiple rows into one row by Entity Type [duplicate]

This question already has answers here:
Pivot rows to columns without aggregate
(3 answers)
Closed 4 years ago.
I have a table like this which has an Entity type and Entity Item Id. I would like to group them by ID column and merge these records into one row into respective column type.
Id EntityItemId EntityTypeId
1 id1 1
1 id2 2
1 id3 3
1 id4 4
2 id5 1
2 id6 2
2 id7 3
Desired Output:
ID Entitytype1 Entitytype2 Entitytype3 Entitytype4
1 id1 id2 id3 id4
2 id5 id6 id7 null
Thanks

use pivot
select ID ,[1] as Entitytype1,[2] as Entitytype2
,[3] as Entitytype3 ,[4] as Entitytype4 from
(
select * from t
) src
PIVOT
(
max(EntityItemId) for EntityTypeId in ([1],[2],[3],[4])
)pv

SQL: Assembling Non-Overlapping Sets

I have sets of consecutive integers, organized by type, in table1. All values are between 1 and 10, inclusive.
table1:
row_id set_id type min_value max_value
1 1 a 1 3
2 2 a 4 10
3 3 a 6 10
4 4 a 2 5
5 5 b 1 9
6 6 c 1 7
7 7 c 3 10
8 8 d 1 2
9 9 d 3 3
10 10 d 4 5
11 11 d 7 10
In table2, within each type, I want to assemble all possible maximal, non-overlapping sets (though gaps that cannot be filled by any sets of the correct type are okay). Desired output:
table2:
row_id type group_id set_id
1 a 1 1
2 a 1 2
3 a 2 1
4 a 2 3
5 a 3 3
6 a 3 4
7 b 4 5
8 c 5 6
9 c 6 7
10 d 7 8
11 d 7 9
12 d 7 10
13 d 7 11
My current idea is to use the fact that there is a limited number of possible values. Steps:
Find all sets in table1 containing value 1. Copy them into table2.
Find all sets in table1 containing value 2 and not already in table2.
Join the sets from (2) with table1 on type, set_id, and having min_value greater than the group's greatest max_value.
For the sets from (2) that did not join in (3), insert them into table2. These start new groups that may be extended later.
Repeat steps (2) through (4) for values 3 through 10.
I think this will work, but it has a lot of pain-in-the-butt steps, especially for (2)--finding the sets not in table2, and (4)--finding the sets that did not join.
Do you know a faster, more efficient method? My real data has millions of sets, thousands of types, and hundreds of values (though fortunately, as in the example, the values are bounded), so scalability is essential.
I'm using PLSQL Developer with Oracle 10g (not 11g as I stated before--thanks, IT department). Thanks!

For Oracle 10g you can't use recursive CTEs, but with a bit of work you can do something similar with the connect by syntax. First you need to generate a CTE or in-line view which has all the non-overlapping links, which you can do with:
select t1.type, t1.set_id, t1.min_value, t1.max_value,
t2.set_id as next_set_id, t2.min_value as next_min_value,
t2.max_value as next_max_value,
row_number() over (order by t1.type, t1.set_id, t2.set_id) as group_id
from table1 t1
left join table1 t2 on t2.type = t1.type
and t2.min_value > t1.max_value
where not exists (
select 1
from table1 t4
where t4.type = t1.type
and t4.min_value > t1.max_value
and t4.max_value < t2.min_value
)
order by t1.type, group_id, t1.set_id, t2.set_id;
This took a bit of experimentation and it's certainly possible I've missed or lost something about the rules in the process; but that gives you 12 pseudo-rows, and is in my previous answer this allows the two separate chains starting with a/1 to be followed while constraining the d values to a single chain:
TYPE SET_ID MIN_VALUE MAX_VALUE NEXT_SET_ID NEXT_MIN_VALUE NEXT_MAX_VALUE GROUP_ID
---- ------ ---------- ---------- ----------- -------------- -------------- --------
a 1 1 3 2 4 10 1
a 1 1 3 3 6 10 2
a 2 4 10 3
a 3 6 10 4
a 4 2 5 3 6 10 5
b 5 1 9 6
c 6 1 7 7
c 7 3 10 8
d 8 1 2 9 3 3 9
d 9 3 3 10 4 5 10
d 10 4 5 11 7 10 11
d 11 7 10 12
And that can be used as a CTE; querying that with a connect-by loop:
with t as (
... -- same as above query
)
select t1.type,
dense_rank() over (partition by null
order by connect_by_root group_id) as group_id,
t1.set_id
from t t1
connect by type = prior type
and set_id = prior next_set_id
start with not exists (
select 1 from table1 t2
where t2.type = t1.type
and t2.max_value < t1.min_value
)
and not exists (
select 1 from t t3
where t3.type = t1.type
and t3.next_max_value < t1.next_min_value
)
order by t1.type, group_id, t1.min_value;
The dense_rank() makes the group IDs contiguous; not sure if you actually need those at all, or if their sequence matters, so it's optional really. connect_by_root gives the group ID for the start of the chain, so although there were 12 rows and 12 group_id values in the initial query, they don't all appear in the final result.
The connection is via two prior values, type and the next set ID found in the initial query. That creates all the chains, but own its own would also include shorter chains - for d you'd see 8,9,10,11 but also 9,10,11 and 10,11, which you don't want as separate groups. Those are eliminated by the start with conditions, which could maybe be simplified.
That gives:
TYPE GROUP_ID SET_ID
---- -------- ------
a 1 1
a 1 2
a 2 1
a 2 3
a 3 4
a 3 3
b 4 5
c 5 6
c 6 7
d 7 8
d 7 9
d 7 10
d 7 11
SQL Fiddle demo.

If you can identify all the groups and their starting set_id then you can use a recursive approach and do this all in a single statement, rather than needing to populate a table iteratively. However you'd need to benchmark both approaches both for speed/efficiency and resource consumption - whether it will scale for your data volumes and within your system's available resources would need to be verified.
If I understand when you decide to start a new group you can identify them all at once with a query like:
with t as (
select t1.type, t1.set_id, t1.min_value, t1.max_value,
t2.set_id as next_set_id, t2.min_value as next_min_value,
t2.max_value as next_max_value
from table1 t1
left join table1 t2 on t2.type = t1.type and t2.min_value > t1.max_value
where not exists (
select 1
from table1 t3
where t3.type = t1.type
and t3.max_value < t1.min_value
)
)
select t.type, t.set_id, t.min_value, t.max_value,
t.next_set_id, t.next_min_value, t.next_max_value,
row_number() over (order by t.type, t.min_value, t.next_min_value) as grp_id
from t
where not exists (
select 1 from t t2
where t2.type = t.type
and t2.next_max_value < t.next_min_value
)
order by grp_id;
The tricky bit here is getting all three groups for a, specifically the two groups that start with set_id = 1, but only one group for d. The inner select (in the CTE) looks for sets that don't have a lower non-overlapping range via the not exists clause, and outer-joins to the same table to get the next set(s) that don't overlap, which gives you two groups that start with set_id = 1, but also four that start with set_id = 9. The outer select then ignores everything but the lowest non-overlapping with a second not exists clause - but doesn't have to hit the real table again.
So that gives you:
TYPE SET_ID MIN_VALUE MAX_VALUE NEXT_SET_ID NEXT_MIN_VALUE NEXT_MAX_VALUE GRP_ID
---- ------ ---------- ---------- ----------- -------------- -------------- ------
a 1 1 3 2 4 10 1
a 1 1 3 3 6 10 2
a 4 2 5 3 6 10 3
b 5 1 9 4
c 6 1 7 5
c 7 3 10 6
d 8 1 2 9 3 3 7
You can then use that as the anchor member in a recursive subquery factoring clause:
with t as (
...
),
r (type, set_id, min_value, max_value,
next_set_id, next_min_value, next_max_value, grp_id) as (
select t.type, t.set_id, t.min_value, t.max_value,
t.next_set_id, t.next_min_value, t.next_max_value,
row_number() over (order by t.type, t.min_value, t.next_min_value)
from t
where not exists (
select 1 from t t2
where t2.type = t.type
and t2.next_max_value < t.next_min_value
)
...
If you left the r CTE with that and just did sleect * from r you'd get the same seven groups.
The recursive member then uses the next set_id and its range from that query as the next member of each group, and repeats the outer join/not-exists look up to find the next set(s) again; stopping when there is no next non-overlapping set:
...
union all
select r.type, r.next_set_id, r.next_min_value, r.next_max_value,
t.set_id, t.min_value, t.max_value, r.grp_id
from r
left join table1 t
on t.type = r.type
and t.min_value > r.next_max_value
and not exists (
select 1 from table1 t2
where t2.type = r.type
and t2.min_value > r.next_max_value
and t2.max_value < t.min_value
)
where r.next_set_id is not null -- to stop looking when you reach a leaf node
)
...
Finally you have a query based on the recursive CTE to get the columns you want and to specify the order:
...
select r.type, r.grp_id, r.set_id
from r
order by r.type, r.grp_id, r.min_value;
Which gets:
TYPE GRP_ID SET_ID
---- ---------- ----------
a 1 1
a 1 2
a 2 1
a 2 3
a 3 4
a 3 3
b 4 5
c 5 6
c 6 7
d 7 8
d 7 9
d 7 10
d 7 11
SQL Fiddle demo.
If you wanted to you could show the min/max values for each set, and could track and show the min/max value for each group. I've just show then columns from the question though.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Using Subqueries to remove duplicate IDs [closed] - sql

Related

MSAccess - query to return result set of earliest rows with a unique combination of 2 columns

How To Get Top N Rows per Each Group - MS Access [duplicate]

Excluding rows based on column

Merging Multiple rows into one row by Entity Type [duplicate]

SQL: Assembling Non-Overlapping Sets

Categories

Resources