SQL Server query with intersect except or union relational Algebra - sql

I am trying to solve a problem. It seems that of a brain teaser if you ask me.
Given two tables, return only values from the first table when there is a match for EVERY record in a second table. So a record in table 1 must have a match to every record in table 2. If table 2 has fewer than every row I want to exclude it from the final result.
This must be done without using count, having, group by. I must solve it with union, intersect, except, exists.
I am using SQL Server BTW.
CREATE TABLE table1 (id int, lid int)
INSERT INTO table1 VALUES (1, 1),(1, 2),(1,3),(1,4),(2,1),(3,3),(4,4)
CREATE TABLE table2 (lid int)
INSERT INTO table2 VALUES (1),(2),(3),(4)
Table 1:
id lid
--------
1 1
1 2
1 3
1 4
2 1
3 3
4 4
Table2:
lid
-----
1
2
3
4
This method here is "not the way I am supposed to solve it". Frustrating because this solution is so simple and does exactly what it should do. I can't use count, group by, and having.
SELECT id
FROM dbo.table1, dbo.table2
WHERE table1.lid = table2.lid
GROUP BY id
HAVING COUNT(*) = (SELECT COUNT(*) FROM dbo.table2)
So basically I need to find a way to exclude the results from the first table when there is not a full set of matches in table 2. In this example the only value in table 1 with a match to every record in table 2 is 1. 2,3,4 would need to be excluded.

What you're looking for has a name. It's called relational division. It has no equivalent in SQL, although it can be emulated in a variety of ways. Joe Celko has written one of the most complete blog posts about the topic.
Since you must use some of the more basic relational operators in SQL, this could be one solution for you:
SELECT DISTINCT id
FROM table1 t1a
WHERE NOT EXISTS (
SELECT *
FROM table2
WHERE NOT EXISTS (
SELECT *
FROM table1 t1b
WHERE t1a.id = t1b.id
AND t2.lid = t1b.lid
)
)
It reads in English, informally:
Get me all the elements in table1 for which there is no element in table2, which doesn't match such an element from table1
Or also:
Get me the elements from table1, which match all the elements in table2

That's one of the solutions:
select distinct id from table1 AS T1
where not exists(
select lid from table2
except
select lid from table1 where id = T1.id
)

Related

Finding the id's which include multiple criteria in long format

Suppose I have a table like this,
id
tagId
1
1
1
2
1
5
2
1
2
5
3
2
3
4
3
5
3
8
I want to select id's where tagId includes both 2 and 5. For this fake data set, It should return 1 and 3.
I tried,
select id from [dbo].[mytable] where tagId IN(2,5)
But it takes 2 and 5 into account respectively. I also did not want to keep my table in wide format since tagId is dynamic. It can reach any number of columns. I also considered filtering with two different queries to find (somehow) the intersection. However since I may search more than two values inside the tagId in real life, it sounds inefficient to me.
I am sure that this is something faced before when tag searching. What do you suggest? Changing table format?
One option is to count the number of distinct tagIds (from the ones you're looking for) each id has:
SELECT id
FROM [dbo].[mytable]
WHERE tagId IN (2,5)
GROUP BY id
HAVING COUNT(DISTINCT tagId) = 2
This is actually a Relational Division With Remainder question.
First, you have to place your input into proper table format. I suggest you use a Table Valued Parameter if executing from client code. You can also use a temp table or table variable.
DECLARE #ids TABLE (tagId int PRIMARY KEY);
INSERT #ids VALUES (2), (5);
There are a number of different solutions to this type of question.
Classic double-negative EXISTS
SELECT DISTINCT
mt.Id
FROM mytable mt
WHERE NOT EXISTS (SELECT 1
FROM #ids i
WHERE NOT EXISTS (SELECT 1
FROM mytable mt2
WHERE mt2.id = mt.id
AND mt2.tagId = i.tagId)
);
This is not usually efficient though
Comparing to the total number of IDs to match
SELECT mt.id
FROM mytable mt
JOIN #ids i ON i.tagId = mt.tagId
GROUP BY mt.id
HAVING COUNT(*) = (SELECT COUNT(*) FROM #ids);
This is much more efficient. You can also do this using a window function, it may be more or less efficient, YMMV.
SELECT mt.Id
FROM mytable mt
JOIN (
SELECT *,
total = COUNT(*) OVER ()
FROM #ids i
) i ON i.tagId = mt.tagId
GROUP BY mt.id
HAVING COUNT(*) = MIN(i.total);
Another solution involves cross-joining everything and checking how many matches there are using conditional aggregation
SELECT mt.id
FROM (
SELECT
mt.id,
mt.tagId,
matches = SUM(CASE WHEN i.tagId = mt.tagId THEN 1 END),
total = COUNT(*)
FROM mytable mt
CROSS JOIN #ids i
GROUP BY
mt.id,
mt.tagId
) mt
GROUP BY mt.id
HAVING SUM(matches) = MIN(total)
AND MIN(matches) >= 0;
db<>fiddle
There are other solutions also, see High Performance Relational Division in SQL Server

Find duplicates in table 2 and return back the record based on the id

I am trying to make a query to remove the record in table 1 based on the duplicate records found in table 2. id is the common link between these two tables. The Database is oracle. I am new in writing up queries and the below is the query i came up with so far which is not working out. Can anyone please suggest?
I am actually trying to delete record based on the id in table 1 on a condition when there are duplicate records in table 2 for that id as well as one more column? Below is the error message i am getting, am really not sure if query is accurate either or need to re write the whole query itself?
"invalid sql statement" - ORA-00900
DELETE TABLE AS m WHERE m.id IN
(SELECT id from table2 t WHERE ROWID >
(SELECT MIN(ROWID) FROM table2 r WHERE t.column2 = r.column2);
You can try the following one(use only tab as the table in the statement) :
create table tab ( id int, val int );
insert into tab values(1 ,331);
insert into tab values(1 ,332);
insert into tab values(2 ,333);
insert into tab values(2 ,333);
select * from tab;
ID VAL
1 331
1 332
2 333
2 333
delete tab a
where
rowid >
(select min(rowid)
from tab b
where b.id=a.id
group by b.id);
select * from tab;
ID VAL
1 331
2 333
sqlfiddle demo
As a side note : table is a reserved keyword in Oracle and may not be a table name, "table" may be used alternatively.
You would find the duplicates somehow. This is a little unclear, but perhaps:
select t2.column2, count(*)
from table2 t2
group by t2.column2
having count(*) >= 2;
You can then put this into the delete:
delete from m
where m.id in (select t2.column2
from table2 t2
group by t2.column2
having count(*) >= 2
);

Combine three columns from different tables into one row

I am new to sql and are trying to combine a column value from three different tables and combine to one row in DB2 Warehouse on Cloud. Each table consists of only one row and unique column name. So what I want to is just join these three to one row their original column names.
Each table is built from a statement that looks like this:
SELECT SUM(FUEL_TEMP.FUEL_MLAD_VALUE) AS FUEL
FROM
(SELECT ML_ANOMALY_DETECTION.MLAD_METRIC AS MLAD_METRIC, ML_ANOMALY_DETECTION.MLAD_VALUE AS FUEL_MLAD_VALUE, ML_ANOMALY_DETECTION.TAG_NAME AS TAG_NAME, ML_ANOMALY_DETECTION.DATETIME AS DATETIME, DATA_CONFIG.SYSTEM_NAME AS SYSTEM_NAME
FROM ML_ANOMALY_DETECTION
INNER JOIN DATA_CONFIG ON
(ML_ANOMALY_DETECTION.TAG_NAME =DATA_CONFIG.TAG_NAME AND
DATA_CONFIG.SYSTEM_NAME = 'FUEL')
WHERE ML_ANOMALY_DETECTION.MLAD_METRIC = 'IFOREST_SCORE'
AND ML_ANOMALY_DETECTION.DATETIME >= (CURRENT DATE - 9 DAYS)
ORDER BY DATETIME DESC)
AS FUEL_TEMP
I have tried JOIN, INNER JOIN, UNION/UNION ALL, but can't get it to work as it should. How can I do this?
Use a cross-join like this:
create table table1 (field1 char(10));
create table table2 (field2 char(10));
create table table3 (field3 char(10));
insert into table1 values('value1');
insert into table2 values('value2');
insert into table3 values('value3');
select *
from table1
cross join table2
cross join table3;
Result:
field1 field2 field3
---------- ---------- ----------
value1 value2 value3
A cross join joins all the rows on the left with all the rows on the right. You will end up with a product of rows (table1 rows x table2 rows x table3 rows). Since each table only has one row, you will get (1 x 1 x 1) = 1 row.
Using UNION should solve your problem. Something like this:
SELECT
WarehouseDB1.WarehouseID AS TheID,
'A' AS TheSystem,
WarehouseDB1.TheValue AS TheValue
FROM WarehouseDB1
UNION
SELECT
WarehouseDB2.WarehouseID AS TheID,
'B' AS TheSystem,
WarehouseDB2.TheValue AS TheValue
FROM WarehouseDB2
UNION
WarehouseDB3.WarehouseID AS TheID,
'C' AS TheSystem,
WarehouseDB3.TheValue AS TheValue
FROM WarehouseDB3
Ill adapt the code with your table names and rows if you tell me what they are. This kind of query would return something like the following:
TheID TheSystem TheValue
1 A 10
2 A 20
3 B 30
4 C 40
5 C 50
As long as your column names match in each query, you should get the desired results.

How select values where all columns are null for particular ID, ID is not unique

I have a table with following format and I want to get the LotId if Value1 is null for all the rows.
Now If I am doing Select,
Select * from Table1 where Value1 IS null , I am getting back a row .
But I want nothing should be returned as there are two rows which have some value.
I thought of self join , but this can have n number of rows.
Id LotId Value1
-------------------------------------------------
1 LOt0065 NULL
2 LOt0065 SomeValue
3 LOt0065 SomeValue
I think you'll need to use an EXISTS subquery here:
SELECT a.lotid
FROM table1 a
WHERE NOT EXISTS (
SELECT 1
FROM table1 b
WHERE b.lotid = a.lotid
AND b.value1 IS NOT NULL
);
If my syntax is right, then this will show you all records that don't have any NULL values for that lotid:
It uses a SELECT 1 because the subquery doesn't need to show any value, it just needs to match on the outer query.
You compare the table in the inner query to the table in the outer query and match on the common field you're looking at (lotid in this case)
This could also be done with a NOT IN clause.
Does this give you the result you want?

I am trying to return a certain values in each row which depend on whether different values in that row are already in a different table

I'm still a n00b at SQL and am running into a snag. What I have is an initial selection of certain IDs into a temp table based upon certain conditions:
SELECT DISTINCT ID
INTO #TEMPTABLE
FROM ICC
WHERE ICC_Code = 1 AND ICC_State = 'CA'
Later in the query I SELECT a different and much longer listing of IDs along with other data from other tables. That SELECT is about 20 columns wide and is my result set. What I would like to be able to do is add an extra column to that result set with each value of that column either TRUE or FALSE. If the ID in the row is in #TEMPTABLE the value of the additional column should read TRUE. If not, FALSE. This way the added column will ready TRUE or FALSE on each row, depending on if the ID in each row is in #TEMPTABLE.
The second SELECT would be something like:
SELECT ID,
ColumnA,
ColumnB,
...
NEWCOLUMN
FROM ...
NEWCOLUMN's value for each row would depend on whether the ID in that row returned is in #TEMPTABLE.
Does anyone have any advice here?
Thank you,
Matt
If you left join to the #TEMPTABLE you'll get a NULL where the ID's don't exist
SELECT ID,
ColumnA,
ColumnB,
...
T.ID IS NOT NULL AS NEWCOLUMN -- Gives 1 or 0 or True/false as a bit
FROM ... X
LEFT JOIN #TEMPTABLE T
ON T.ID = X.ID -- DEFINE how the two rows can be related unquiley
You need to LEFT JOIN your results query to #TEMPTABLE ON ID, this will give you the ID if there is one and NULL if there isn't, if you want 1 or 0 this would do it (For SQL Server) ISNULL(#TEMPTABLE.ID,0)<>0.
A few notes on coding for performance:
By definition an ID column is unique so the DISTINCT is redundant and causes unnecisary processing (unless it is an ID from another table)
Why would you store this to a temporary table rather than just using it in the query directly?
You could use a union and a subquery.
Select . . . . , 'TRUE'
From . . .
Where ID in
(Select id FROM #temptable)
UNION
SELECT . . . , 'FALSE'
FROM . . .
WHERE ID NOT in
(Select id FROM #temptable)
So the top part, SELECT ... FROM ... WHERE ID IN (Subquery), does a SELECT if the ID is in your temptable.
The bottom part does a SELECT if the ID is not in the temptable.
The UNION operator joins the two results nicely, since both SELECT statements will return the same number of columns.
To expand on what someone else was saying with Union, just do something like so
SELECT id, TRUE AS myColumn FROM `table1`
UNION
SELECT id, FALSE AS myColumn FROM `table2`