Efficient way to verify a table is a subset of another table - sql

I have two tables A and B, the structures are exactly the same. I need to verify A is a subset of B. Because the structure contains over 100 fields, I do not want to list them one by one in a where predicates.
I would like to know if there is any more easier way to do that

Assumptions:
(1) Identical table structure of A and B. This means that both order of columns and their data types have to match.
(2) There are no duplicate rows in table A
Problem description
To prove that A is a subset of B you need to show that A\B = empty set.
Solution
This means that if you remove every row in A that has a matching row in B and your output is empty (0 rows) this means that A is subset of B.
If on the other hand, in the output you get > 0 rows it means that A has rows that B doesnt and that A isn't a subset of B.
SELECT * FROM A
EXCEPT
SELECT * FROM B
When A is empty (contains 0 rows) it will be treated as a subset of B, because the result of above query will be 0 rows.

#robertoplancarte's approach with little tweaking
with tB_cnt as
(
SELECT COUNT(*) cnt FROM
(
SELECT DISTINCT * FROM dbo.T_B
) T_B
), TAB_cnt as
(
SELECT count(*) cnt FROM
(
SELECT * FROM dto.T_B
UNION
SELECT * FROM dto.T_A
) T_AB
)
SELECT
CASE WHEN TB_CNT.CNT = TAB_CNT.CNT THEN
'Table A is subset of B'
else
'Table A is not subset of B'
END as Result
FROM TAB_CNT, TB_CNT

Related

Query to return nonmatching lines in two arbitrary tables

I have two sets of tables (i.e. a.1, a.2, a.3, b.1, b.2, b.3, etc) created using slightly different logic. The analogous table in the two schemas have the exact same columns (i.e. a.1 has the same columns as b.1). My belief is that the tables in the two schemas should contain the exact same information, but I want to test that belief. Therefore I want to write a query that compares two analogous tables and returns lines that are not in both tables. Is there an easy way to write a query to do that without manually writing the join? In other words, can I have a query that can produce the results that I want where I only have to change the table names I want to compare while leaving the rest of the query unchanged?
To be a bit more explicit, I'm looking to do something like the following:
select *
from a.1
where (all columns in a.1) not in (select * from b.1);
If I could write something like this then all I would have to do to compare a.2 to b.2 would be to change the table names. However, it's not clear to me how to come up with the (all columns in a.1) piece in a general way.
Based on a recommendation in the comments, I've created the following showing the kind of thing I'd like to see:
https://dbfiddle.uk/?rdbms=db2_11.1&fiddle=ad0141b0daf8f8f92e6e3fa8d57e67ad
I was looking for the except clause.
So
select *
from a.1
where (all columns in a.1) not in (select * from b.1);
can be written as
select * from a.1
except
select * from b.1
In db-fiddle I give an explicit exmaple of what I wanted.
If you have a primary key to match rows between the tables, then you can try a full anti-join. For example:
select a.id as aid, b.id as bid
from a
full join b on b.id = a.id
where a.id is null or b.id is null
If the tables are:
A: 1, 2, 3
B: 1, 2, 4
The result is:
AID BID
---- ----
null 4 -- means it's present in B, but not in A
3 null -- means it's present in A, but not in B
See running example at db<>fiddle.
Of course, if your tables do not have a primary key, or if the rows are inconsistent (same PK, different data), then you'll need to adjust the query.
As an alternative you can try this:
select 'a1' t,* from (
select a1.*,row_number() over (partition by c1 order by 1) as rn from a1
minus
select b1.*,row_number() over (partition by c1 order by 1) as rn from b1
)
union all
select 'b1' t,* from (
select b1.*,row_number() over (partition by c1 order by 1) as rn from b1
minus
select a1.*,row_number() over (partition by c1 order by 1) as rn from a1
)
fiddle
edit: you can shorten the query by precalculating the rn part, instead of doing the same calculation again.

Merge two different tables with Access

I’d would like to merge two different tables with similar and different columns. The only different columns are : Amount-F21 and Amount A-21. My issue is when I write the SQL request (UNION ALL) with Access, it deletes the column Amount A-21 but I need this one though. Thanks.
SELECT * FROM [Source Alloc-A21]
UNION ALL
SELECT * FROM [Source Alloc-F21]
To use the star notation Table.* with UNION, the columns in both tables must be equal. If they are not, you need to select individual columns and provide default values for the columns that are missing for both tables.
For example:
SELECT TableA.A, TableA.B, TableA.[Amount-F21], 0 AS [Amount-A21]
FROM TableA
UNION ALL
SELECT TableB.A, TableB.B, 0 AS [Amount-F21], TableB.[Amount-A21]
FROM TableB
This will report 0 for any of the missing columns (Amount-F21 or Amount A-21).
You can then sum the results to hide the zero (default) values.
SELECT T.A, T.B, SUM(T.[Amount-F21]) AS [Amount-F21], SUM(T.[Amount-A21]) AS [Amount-A21]
FROM (
SELECT TableA.A, TableA.B, TableA.[Amount-F21], 0 AS [Amount-A21]
FROM TableA
UNION ALL
SELECT TableB.A, TableB.B, 0 AS [Amount-F21], TableB.[Amount-A21]
FROM TableB
) AS T
GROUP BY T.A, T.B

How to count the number of distinct values for each specific

I have a database containing two separate fields A and B. I want to find out if for any given value of A there are multiple rows with different values of B.
I have tried using group by and distinct but I am doing something wrong, because I keep getting results which, when I query the specific value of A, all the values of B are the same. I have tried variants on the following including:
SELECT COUNT(B) FROM ex1 GROUP BY A HAVING COUNT(*) > 1;
SELECT COUNT(DISTINCT B) FROM ex1 GROUP BY A HAVING COUNT(DISTINCT B) > 1;
Strangely, this last one wound up giving me results where for a given value of B there were multiple values of A, which is backwards from what I wanted. I tried reversing A and B in the last query but that wound up giving me cases where A only had a single value of B.
How can I get records for only where there is a specific value of A in multiple records, each of which has a different value for B?
Give this a try:
"records for only where there is a specific value of A in multiple records, each of which has a different value for B?"
SELECT DISTINCT ex1a.A
FROM ex1 ex1a
WHERE
(SELECT COUNT(ex1b.B) FROM ex1 ex1b WHERE ex1a.A=ex1b.A)
= (SELECT COUNT(DISTINCT ex1b.B) FROM ex1 ex1b WHERE ex1a.A=ex1b.A)
AND
(SELECT COUNT(ex1c.B) FROM ex1 ex1c WHERE ex1a.A = ex1c.A) > 1
And, you can remove the last SELECT if you want to include the case where there is just 1 (distinct) record for A and B.
this should work:
create table want as
select a,b,count(*)as cnt from(
select a,b,count(*) as num from have
group by a, b)
group by a having cnt > 1;

How can I find out the relationship between two columns in database?

I have a view defined in SQL Server database and it has two columns A and B, both of which have the type of INT. I want to find out the relationship between these two, 1 to 1 or 1 to many or many to many. Is there a SQL statement I can use to find out?
For the relationship, it means for a given value of A, how many values of B maps to this value. If there is only one value, then it is 1 to 1 mapping.
You could use CTEs to generate COUNTs of how many distinct A values were associated with each B value and vice versa, then take the MAX of those values to determine if the relationship is 1 or many on each side. For example:
WITH CTEA AS (
SELECT COUNT(DISTINCT B) ac
FROM t
GROUP BY A
),
CTEB AS (
SELECT COUNT(DISTINCT A) bc
FROM t
GROUP BY B
)
SELECT CONCAT(
CASE WHEN MAX(bc) = 1 THEN '1' ELSE 'many' END,
' to ',
CASE WHEN MAX(ac) = 1 THEN '1' ELSE 'many' END
) AS [A to B]
FROM CTEA
CROSS JOIN CTEB
Note that any time a relationship is listed as 1, it may actually be many but just not showing that because of limited data in the table.
Demo on dbfiddle
Assuming you have no NULL values:
select (case when count(*) = count(distinct a) and
count(*) = count(distinct b)
then '1-1'
when count(*) = count(distinct a) or
count(*) = count(distinct b)
then '1-many'
else 'many-many'
end)
from t;
Note: This does not distinguish between 1-many for a-->b or b-->a.
You would use count and group by to get this information.
--This would give you count of values of b which map to every values of a. If there is at least one row with a count give you a value greater than 1 it means the mapping between a and b is one to many.
select a,count( distinct b)
from table
group by a
If all of the rows have the values equal to one for all of the elements in a then the mapping is one-one
A caveat , null in b would be ignored in count expressions. ie because null and another null is not equivalent

Find Rows where the Same Two Column Values Recur

Given a table in SQL-Server like:
Id INTEGER
A VARCHAR(50)
B VARCHAR(50)
-- Some other columns
with no index on A or B, I wish to find rows where a unique combination of A and B occurs more than once.
I'm using the query
SELECT A+B, Count(A+B) FROM MyTable
GROUP BY A+B
HAVING COUNT(A+B) > 1
First Question
Is there a more time-efficient way to do this? (I cannot add indices to the database)
Second Question
When I attempt to gain some formatting of the output by including a , in the concatenation:
SELECT A+','+B, Count(A+','+B) FROM MyTable
GROUP BY A+','+B
HAVING COUNT(A+','+B) > 1
The query fails with the error
Column 'MyDB.dbo.MyTable.A' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause
with a similar error for Column B.
How can I format the output to separate the two columns?
It would seem more natural to me to write:
SELECT A, B, Count(*) FROM MyTable
GROUP BY A, B
HAVING COUNT(*) > 1
And it's the most efficient way of doing it (and so is the query in the question).
Similarly to the above query, you can rewrite your second query:
SELECT A + ',' + B, Count(*) FROM MyTable
GROUP BY A, B
HAVING COUNT(*) > 1