Need to get rows where combination of two columns both exist and don't exit - sql

Trying to figure out if it's possible to write a single, set based query to return what I want with data in one single table. The below is just an example, and I need something that could easily work if most (but not all) of combinations 1 to 9 (or 1 to 20 etc) exist.
Table AllCovered has two columns. ID1 and ID2. There are 16 rows in this table, each containing a combination of the numbers 1 to 4 (so 1,1 1,2 1,3 1,4 2,1 .... 4,3 4,4)
Table SomeGaps has the same structure but only has 12 rows, again each row is a combination of 1 to 4, but with some of the combinations missing.
SELECT ID1, ID2, COUNT(ID1) as THIS
FROM AllCovered
GROUP BY ID1, ID2
- this query returns 16 rows, each combination with 1 in the 3rd column (THIS)
SELECT ID1, ID2, COUNT(ID1) as THIS
FROM SomeGaps
GROUP BY ID1, ID2
- this returns the 12 rows. How can I create query that will return 16 rows, of each combination but with 0 in THIS for the combinations that are missing in somegaps?
ID1 ID2 THIS
1 1 1
1 2 0 (1,2 combination does NOT exist in SomeGaps)
1 3 1
1 4 1
2 1 1
2 2 0 (2,2 combination does NOT exist in SomeGaps)
Obviously I've tried using a crossjoin to get all combinations of ID1 and ID2 but the COUNT is, as expected, vastly inflated.
Hope this makes sense. Apologies if it's an easy solution, I can't seem to crack it!

You can do this by cross-joining all the distinct values for the two columns. Then use left outer join and aggregation to get the counts for all combinations:
select ac.id1, ac.id2, count(ac.id1) as cnt
from (select distinct id1 from AllCovered) ac1 cross join
(select distinct id2 from AllCovered) ac2 left join
AllCovered ac
on ac.id1 = ac1.id1 and ac.id2 = ac2.id2
group by ac.id1, ac.id2;

I'm probably missing something obvious, but I'll take a bite anyway:
create table #AllCovered (id1 int, id2 int);
insert #AllCovered values
(1,1),(1,2),(1,3),(1,4),(2,1),(2,2),(2,3),(2,4),(3,1),(3,2),(3,3),(3,4),(4,1),(4,2),(4,3),(4,4);
create table #gaps (id1 int, id2 int);
insert #gaps values(1,1),(1,2),(1,3),(1,4),(2,1),(2,4),(3,1),(3,2),(3,3),(4,1),(4,2),(4,4);
select #AllCovered.id1, #AllCovered.id2,
count(#gaps.id1) as this
from #AllCovered
left outer join #gaps
on #AllCovered.id1 = #gaps.id1 and #AllCovered.id2 = #gaps.id2
group by #AllCovered.id1, #AllCovered.id2;
drop table #AllCovered, #gaps
From your narrative, there are no duplicate combinations of (id1, id2) in neither table, and AllCovered contains all possible combinations -- otherwise will use distinct subqueries and fabricate AllCovered.

Related

An association x-reference table

I've got a SQL table like below where one value is linked to a second value and vice versa.
ROW
ID1
ID2
1
1
2
2
2
1
3
3
4
4
4
3
....
This might be some bad design but this is what I'm stuck with. I need to produce a SQL query in SQL Server to return only the following (doesn't matter which order):
ROW
ID1
ID2
1
1
2
3
3
4
....
OR
ROW
ID1
ID2
2
2
1
4
4
3
....
I've got a list of ID's (1, 2, 3, 4) which I used to query the table against ID1 field or ID2 field, but it always returns all the rows because those IDs exist in both columns.
I've tried looking at eliminating one row by looking if the one field it exists in the other column, but then I get no results. Obviously.
The one solution that could work is by looking at the rownum field and only get the even or odd rows. But this feels hacky. Also there might be other values in that list that is not part of my IN list, so that could possibly miss some rows?
Anything eloquent to consider from a TSQL perspective
Here's one (quite cumbersome but pretty effective) way to do it.
First, Create and populate sample table (Please save us this step in future questions):
CREATE TABLE Table1 (
[ROW] int,
[ID1] int,
[ID2] int
);
INSERT INTO Table1 ([ROW], [ID1], [ID2]) VALUES
(1, 1, 2),
(2, 2, 1),
(3, 3, 4),
(4, 4, 3),
(5, 1, 4);
Note: The last raw is not a part of the sample data you've provided, but I assumed you would also like to include in the results records where only one row had the connection beteween Id1 and Id2.
Then, use a couple of common table expression to get the minimum row number of any pair of Id1 and Id2, regardless of the order of ids, and then query the original table joined to the second cte:
WITH CTE1 AS
(
SELECT Row,
IIF(Id1 < Id2, Id1, Id2) As Small,
IIF(Id1 < Id2, Id2, Id1) As Big
FROM Table1
), CTE2 AS
(
SELECT Min(Row) As MinRow
FROM CTE1
GROUP BY Small, Big
)
SELECT Row, Id1, Id2
FROM Table1
JOIN CTE2
ON Row = MinRow;
Results:
Row Id1 Id2
1 1 2
3 3 4
5 1 4
You can see a live demo on DB<>Fiddle

How to conduct large-scale lookups in SQL?

I have a SQL table with several million rows, each row contains columns ID1 and ID2.
I'm trying to lookup the entries in this table for 100,000 unique combinations of ID1 and ID2 and export the results of this query to a CSV file.
In the past for smaller numbers of rows I've been able to use a query along the lines of
SELECT *
FROM Database.Table
WHERE ID1 = 5 AND ID2 = 3
OR ID1 = 9 AND ID2 = 33
OR ID1 = 59 AND ID2 = 332...
However this seems to break once I get beyond a few thousand combinations of ID1 and ID2.
What is the best approach for handling large lookups like this in SQL?
Load the CSV file into a table with two columns: id1 and id2. Make these the primary key.
Then use join or exists:
select bt.*
from bigtable bt join
csvtable csv
on bt.id1 = csv.id1 and bt.id2 = csv.id2;
Have you tried IN statement? https://www.mysqltutorial.org/sql-in.aspx/
SELECT * FROM table WHERE (ID1, ID2) in ( (5, 3), (10, 6), ..)

Combine three columns from different tables into one row

I am new to sql and are trying to combine a column value from three different tables and combine to one row in DB2 Warehouse on Cloud. Each table consists of only one row and unique column name. So what I want to is just join these three to one row their original column names.
Each table is built from a statement that looks like this:
SELECT SUM(FUEL_TEMP.FUEL_MLAD_VALUE) AS FUEL
FROM
(SELECT ML_ANOMALY_DETECTION.MLAD_METRIC AS MLAD_METRIC, ML_ANOMALY_DETECTION.MLAD_VALUE AS FUEL_MLAD_VALUE, ML_ANOMALY_DETECTION.TAG_NAME AS TAG_NAME, ML_ANOMALY_DETECTION.DATETIME AS DATETIME, DATA_CONFIG.SYSTEM_NAME AS SYSTEM_NAME
FROM ML_ANOMALY_DETECTION
INNER JOIN DATA_CONFIG ON
(ML_ANOMALY_DETECTION.TAG_NAME =DATA_CONFIG.TAG_NAME AND
DATA_CONFIG.SYSTEM_NAME = 'FUEL')
WHERE ML_ANOMALY_DETECTION.MLAD_METRIC = 'IFOREST_SCORE'
AND ML_ANOMALY_DETECTION.DATETIME >= (CURRENT DATE - 9 DAYS)
ORDER BY DATETIME DESC)
AS FUEL_TEMP
I have tried JOIN, INNER JOIN, UNION/UNION ALL, but can't get it to work as it should. How can I do this?
Use a cross-join like this:
create table table1 (field1 char(10));
create table table2 (field2 char(10));
create table table3 (field3 char(10));
insert into table1 values('value1');
insert into table2 values('value2');
insert into table3 values('value3');
select *
from table1
cross join table2
cross join table3;
Result:
field1 field2 field3
---------- ---------- ----------
value1 value2 value3
A cross join joins all the rows on the left with all the rows on the right. You will end up with a product of rows (table1 rows x table2 rows x table3 rows). Since each table only has one row, you will get (1 x 1 x 1) = 1 row.
Using UNION should solve your problem. Something like this:
SELECT
WarehouseDB1.WarehouseID AS TheID,
'A' AS TheSystem,
WarehouseDB1.TheValue AS TheValue
FROM WarehouseDB1
UNION
SELECT
WarehouseDB2.WarehouseID AS TheID,
'B' AS TheSystem,
WarehouseDB2.TheValue AS TheValue
FROM WarehouseDB2
UNION
WarehouseDB3.WarehouseID AS TheID,
'C' AS TheSystem,
WarehouseDB3.TheValue AS TheValue
FROM WarehouseDB3
Ill adapt the code with your table names and rows if you tell me what they are. This kind of query would return something like the following:
TheID TheSystem TheValue
1 A 10
2 A 20
3 B 30
4 C 40
5 C 50
As long as your column names match in each query, you should get the desired results.

How select values where all columns are null for particular ID, ID is not unique

I have a table with following format and I want to get the LotId if Value1 is null for all the rows.
Now If I am doing Select,
Select * from Table1 where Value1 IS null , I am getting back a row .
But I want nothing should be returned as there are two rows which have some value.
I thought of self join , but this can have n number of rows.
Id LotId Value1
-------------------------------------------------
1 LOt0065 NULL
2 LOt0065 SomeValue
3 LOt0065 SomeValue
I think you'll need to use an EXISTS subquery here:
SELECT a.lotid
FROM table1 a
WHERE NOT EXISTS (
SELECT 1
FROM table1 b
WHERE b.lotid = a.lotid
AND b.value1 IS NOT NULL
);
If my syntax is right, then this will show you all records that don't have any NULL values for that lotid:
It uses a SELECT 1 because the subquery doesn't need to show any value, it just needs to match on the outer query.
You compare the table in the inner query to the table in the outer query and match on the common field you're looking at (lotid in this case)
This could also be done with a NOT IN clause.
Does this give you the result you want?

return IDs that have ALL of a list of correpsonding values

I have table with 2 columns ....
id id2
1 1
1 2
1 3
2 1
2 2
2 4
3 2
3 3
3 4
I want to return the ids which have for example id2 in (1, 2, 4) but that has all of the values in the list.
In this above case it would return id = 2. Is this possible?
select id
from MyTable
where id2 in (1, 2, 4)
group by id
having count(distinct id2) = 3 --this must match the number of elements in IN clause
Update:
If the list of IDs is variable, then you should create an additional table that contains the varying sets of IDs, which you can then JOIN against to do your filtering.
Are you alluding to relational division? e.g. the supplier who supplies all products, the pilot that can fly all the planes in the hanger, etc?
If so, this article has many example implementations in SQL.
Do a self-join to test different rows on the same table in one go:
SELECT id
FROM t AS t0
JOIN t AS t1 ON t1.id=t0.id
JOIN t AS t2 ON t2.id=t1.id
WHERE t0.id2=1
AND t1.id2=2
AND t2.id2=4