Excluding rows based on column - sql

I am trying to exclude rows where a value exists in another column of other row.
select * from TABLE1
ID1 ID2 VALUE
1 1 HIGH
2 2 MEDIUM
3 3 LOW
4 4 HIGH
5 4 HIGH
6 6 MEDIUM
All the data is coming from the same table what I want is to exclude ID1 = 4 because the value 4 exists in column ID2 in row 5. The final desired result is as follows:
ID1 ID2 VALUE
1 1 HIGH
2 2 MEDIUM
3 3 LOW
6 6 MEDIUM
I tried using a simple query such as:
Select * from TABLE1 Where ID1 = ID2
But this will wrongly also include row 4 as below since I need to exclude it because the value exists in another row but in ID2 column:
ID1 ID2 VALUE
1 1 HIGH
2 2 MEDIUM
3 3 LOW
4 4 HIGH
6 6 MEDIUM

You just have to add, this will exclude the records where you see more than 1 ids.
and id2 not in (Select id2 from table1 group by id2 having count(*) > 1)
Similarly add for id1 with OR

You can use the logic in the query below.
select * from t T1
Where 2 > (Select count(1) from t T2 where T2.id2 = T1.id2);

Related

MSAccess - query to return result set of earliest rows with a unique combination of 2 columns

I have a table with the following columns.
ID (auto-inc)
When (datetime)
id1 (number)
id2 (number)
The combination of id1 and id2 can be unique or duplicated many times.
I need a query that returns the earliest record (by When) for each unique combination of id1+id2.
Example data:
ID
When
id1
id2
1
1-Jan-2020
4
5
2
1-Jan-2019
4
5
3
1-Jan-2021
4
5
4
1-Jan-2020
4
4
5
1-Jan-2019
4
4
6
1-Jan-2021
4
6
I need this to return rows 2, 5 and 6
I cannot figure out how to do this with an SQL query.
I have tried Group By on the concatenation of id1 & id2, and I have tried "Distinct id1, id2", but neither return the entire row of the record with the earliest When value.
If the result set can just return the ID that is fine also, I just need to know the rows that match these two requirements.
Okay, I had a few minutes to kill:
SELECT Data.* FROM Data WHERE ID IN (
SELECT TOP 1 ID FROM Data AS D
WHERE D.id1=Data.id1 AND D.id2=Data.id2 ORDER BY When);
or
SELECT Data.* FROM Data INNER JOIN (
SELECT id1, id2, Min(When) AS MW FROM Data
GROUP BY id1, id2) AS D
ON Data.When = D.MW AND Data.id1=D.id1 AND Data.id2=D.id2;
ID
When
id1
id2
2
1/1/2019
4
5
5
1/1/2019
4
4
6
1/1/2021
4
6

How To Get Top N Rows per Each Group - MS Access [duplicate]

I have a table with the following columns.
ID (auto-inc)
When (datetime)
id1 (number)
id2 (number)
The combination of id1 and id2 can be unique or duplicated many times.
I need a query that returns the earliest record (by When) for each unique combination of id1+id2.
Example data:
ID
When
id1
id2
1
1-Jan-2020
4
5
2
1-Jan-2019
4
5
3
1-Jan-2021
4
5
4
1-Jan-2020
4
4
5
1-Jan-2019
4
4
6
1-Jan-2021
4
6
I need this to return rows 2, 5 and 6
I cannot figure out how to do this with an SQL query.
I have tried Group By on the concatenation of id1 & id2, and I have tried "Distinct id1, id2", but neither return the entire row of the record with the earliest When value.
If the result set can just return the ID that is fine also, I just need to know the rows that match these two requirements.
Okay, I had a few minutes to kill:
SELECT Data.* FROM Data WHERE ID IN (
SELECT TOP 1 ID FROM Data AS D
WHERE D.id1=Data.id1 AND D.id2=Data.id2 ORDER BY When);
or
SELECT Data.* FROM Data INNER JOIN (
SELECT id1, id2, Min(When) AS MW FROM Data
GROUP BY id1, id2) AS D
ON Data.When = D.MW AND Data.id1=D.id1 AND Data.id2=D.id2;
ID
When
id1
id2
2
1/1/2019
4
5
5
1/1/2019
4
4
6
1/1/2021
4
6

Using Subqueries to remove duplicate IDs [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 months ago.
Improve this question
I have 2 Tables.
Table 1 holds ID1 and ID2.
Table 2 holds ID2 and ID3.
Table 1 has unique cases for ID1 and multiple cases for ID2.
TABLE 1:
ID1 | ID2
1 1
2 2
3 3
4 3
5 4
6 5
7 5
8 6
9 7
10 6
Table 2 has unique cases for ID2 and multiple cases for ID3
TABLE 2:
ID2 | ID3
1 1
2 1
3 2
4 3
5 2
6 4
7 5
I want 1 unique case of ID3.
I need remove duplicate ID2s from Table 1 picking to remove the duplicate ID2s based on the smaller ID1
So Table 1 now looks like:
TABLE 1:
ID1 | ID2
1 1
2 2
4 3
5 4
7 5
9 7
10 6
Now I want to go to Table 2 and remove any duplicate ID3s based on the smaller ID2
TABLE 2:
ID2 | ID3
2 1
4 3
5 2
6 4
7 5
So my end result should be (I am joining the tables because both of them have other relevant information I need to combine but these are the IDs I am sorting and filtering to get the correct row):
Final Table:
ID1 | ID2 | ID3
2 2 1
7 5 2
5 4 3
10 6 4
9 7 5
Where now I have a single case for each ID3 based on the largest ID1 and ID2 associated with that ID3.
I have tried creating subqueries in the WHERE function to remove the duplicates but my understanding of SQL is not good enough to really figure out what is happening.
Group By and DISTINCT does not work for this case.
Decision Tree
I added a Decision Tree to help visualize the problem. Essentially, each ID3 can potentially have multiple ID2s, which can potentially have multiple ID1s.
I want to keep only the largest ID1, which gives me the correct ID2 associated with that ID3.
with t1 as (
select ID1, ID2
from
(
select *
,row_number() over(partition by ID2 order by ID1 desc) as rn
from t
) t
where rn = 1
),
t3 as (
select ID2, ID3
from
(
select *
,row_number() over(partition by ID3 order by ID2 desc) as rn
from t2
) t
where rn = 1
)
select t1.ID1
,t1.ID2
,t3.ID3
from t1 join t3 on t3.ID2 = t1.ID2
order by ID3
ID1
ID2
ID3
2
2
1
7
5
2
5
4
3
10
6
4
9
7
5
Fiddle

LEFT JOIN with tables having boolean data producing unexpected result set

A Table_1 with only column BOOLVALUE(int) having records as
1
1
0
0
0
and another Table_2 with only column BOOLVALUE(int) having records as
1
1
1
0
0
.. I am trying to run a query
select t1.BOOLVALUE from Table_1 t1
left join Table_2 t2 on t1.BOOLVALUE=t2.BOOLVALUE
and to my surprise output is not what I expected.There are 12 rows with 6 1's and 6 0's. But doesn't this invalidates how joins work ?
12 rows is completely expected as you have 2 rows related to 3 rows, resulting in 6 rows, and 3 rows related to 2 rows resulting in 6 rows; add these together and you get 12.
When you JOIN all related rows are JOINed based on the ON clause. Your ON clause is t1.BOOLVALUE=t2.BOOLVALUE. This means all the 1s inTable_1 relate to all the 1s in Table_2; so that's 2 rows related to 3 rows (2 * 3). Then all the 0s inTable_1 relate to all the 0s in Table_2; so that's 3 rows related to 2 rows (3 * 2). Hence (2 * 3) + (3 * 2) = 6 + 6 = 12.
If we add an ID column to the table, this might become a little clearer.
Let's say you have 2 tables like this:
ID1
I1
1
1
2
1
3
0
4
0
5
0
ID2
I2
1
1
2
1
3
1
4
0
5
0
Then lets say you have the following query:
SELECT T1.ID1,
T2.ID2,
T1.I1,
T2.I2
FROM dbo.Table1 T1
JOIN dbo.Table2 T2 ON T1.I1 = T2.I2
ORDER BY T1.ID1
T2.ID2;
This would result in the following data set:
ID1
ID2
I1
I2
1
1
1
1
1
2
1
1
1
3
1
1
2
1
1
1
2
2
1
1
2
3
1
1
3
4
0
0
3
5
0
0
4
4
0
0
4
5
0
0
5
4
0
0
5
5
0
0
Here you can see you have a many to many join, and where the "extra" rows are coming from.
If you LEFT JOINed on the ID and I columns, starting at Table1, you would get 5 rows, with 1 row having NULL values for ID2 and I2 (in this case because although the ID matched, I did not):
SELECT T1.ID1,
T2.ID2,
T1.I1,
T2.I2
FROM dbo.Table1 T1
LEFT JOIN dbo.Table2 T2 ON T1.ID1 = T2.ID1
AND T1.I1 = T2.I2
ORDER BY T1.ID1
T2.ID2;
ID1
ID2
I1
I2
1
1
1
1
2
2
1
1
3
NULL
0
NULL
4
4
0
0
5
5
0
0
When you join on a column of which has repeating values the number of rows returned is the product of the number of matching values in the 2 tables.
In this case there are 2 1's in table 1 and 3 in table 2 so SQL returns the 6 possible combinations (2 x 3). As there are 3 x 2 zero combinations you get 12 rows in total.
If you did a cross join you would get 25 rows back (5 x 5).

sqlite delete all results where column a and column b is not in first n items

Lets say I have the following table
a b c
-----------
1 1 5
1 2 3
4 1 2
1 2 4
4 2 10
And I want to delete all rows where none of the first n rows has the same value in a and b as that row.
So for example the resulting tables for various n's would be
n = 1
a b c
-----------
1 1 5
// No row other than the first has a 1 in a, and a 1 in b
n = 2
a b c
-----------
1 1 5
1 2 3
1 2 4
// The fourth row has the same values in a and b as the second, so it is not deleted. The first 2 rows of course match themselves so are not deleted
n = 3
a b c
-----------
1 1 5
1 2 3
4 1 2
1 2 4
// The fourth row has the same values in a and b as the second, so it is not deleted. The first 3 rows of course match themselves so are not deleted
n = 4
a b c
-----------
1 1 5
1 2 3
4 1 2
1 2 4
// The first 4 rows of course match themselves so are not deleted. The fifth row does not have the same value in both a and b as any of the first 4 rows, so is deleted.
I've been trying to work out how to do this using a not in or a not exists, but since I'm interested in two columns matching not just 1 or the whole record, I'm struggling.
Since you are not defining a specific order, the result is not completely defined, but depends on arbitrary choices of implementation regarding which rows are computed first in the limit clause. A different SQLite version for example may give you a different result. With that being said, I believe that you want the following query:
select t1.* from table1 t1,
(select distinct t2.a, t2.b from table1 t2 limit N) tabledist
where t1.a=tabledist.a and t1.b=tabledist.b;
where you should replace N with the desired number of rows
EDIT: So, to delete directly from the existing table you need something like:
with toremove(a, b, c) as
(select * from table1 tt
EXCEPT select t1.* from table1 t1,
(select distinct t2.a, t2.b from table1 t2 limit N) tabledist
where t1.a=tabledist.a and t1.b=tabledist.b)
delete from table1 where exists
(select * from toremove
where table1.a=toremove.a and table1.b=toremove.b and table1.c=toremove.c);