Find Rows where the Same Two Column Values Recur - sql

Given a table in SQL-Server like:
Id INTEGER
A VARCHAR(50)
B VARCHAR(50)
-- Some other columns
with no index on A or B, I wish to find rows where a unique combination of A and B occurs more than once.
I'm using the query
SELECT A+B, Count(A+B) FROM MyTable
GROUP BY A+B
HAVING COUNT(A+B) > 1
First Question
Is there a more time-efficient way to do this? (I cannot add indices to the database)
Second Question
When I attempt to gain some formatting of the output by including a , in the concatenation:
SELECT A+','+B, Count(A+','+B) FROM MyTable
GROUP BY A+','+B
HAVING COUNT(A+','+B) > 1
The query fails with the error
Column 'MyDB.dbo.MyTable.A' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause
with a similar error for Column B.
How can I format the output to separate the two columns?

It would seem more natural to me to write:
SELECT A, B, Count(*) FROM MyTable
GROUP BY A, B
HAVING COUNT(*) > 1
And it's the most efficient way of doing it (and so is the query in the question).
Similarly to the above query, you can rewrite your second query:
SELECT A + ',' + B, Count(*) FROM MyTable
GROUP BY A, B
HAVING COUNT(*) > 1

Related

How to count the number of distinct values for each specific

I have a database containing two separate fields A and B. I want to find out if for any given value of A there are multiple rows with different values of B.
I have tried using group by and distinct but I am doing something wrong, because I keep getting results which, when I query the specific value of A, all the values of B are the same. I have tried variants on the following including:
SELECT COUNT(B) FROM ex1 GROUP BY A HAVING COUNT(*) > 1;
SELECT COUNT(DISTINCT B) FROM ex1 GROUP BY A HAVING COUNT(DISTINCT B) > 1;
Strangely, this last one wound up giving me results where for a given value of B there were multiple values of A, which is backwards from what I wanted. I tried reversing A and B in the last query but that wound up giving me cases where A only had a single value of B.
How can I get records for only where there is a specific value of A in multiple records, each of which has a different value for B?
Give this a try:
"records for only where there is a specific value of A in multiple records, each of which has a different value for B?"
SELECT DISTINCT ex1a.A
FROM ex1 ex1a
WHERE
(SELECT COUNT(ex1b.B) FROM ex1 ex1b WHERE ex1a.A=ex1b.A)
= (SELECT COUNT(DISTINCT ex1b.B) FROM ex1 ex1b WHERE ex1a.A=ex1b.A)
AND
(SELECT COUNT(ex1c.B) FROM ex1 ex1c WHERE ex1a.A = ex1c.A) > 1
And, you can remove the last SELECT if you want to include the case where there is just 1 (distinct) record for A and B.
this should work:
create table want as
select a,b,count(*)as cnt from(
select a,b,count(*) as num from have
group by a, b)
group by a having cnt > 1;

How can I find out the relationship between two columns in database?

I have a view defined in SQL Server database and it has two columns A and B, both of which have the type of INT. I want to find out the relationship between these two, 1 to 1 or 1 to many or many to many. Is there a SQL statement I can use to find out?
For the relationship, it means for a given value of A, how many values of B maps to this value. If there is only one value, then it is 1 to 1 mapping.
You could use CTEs to generate COUNTs of how many distinct A values were associated with each B value and vice versa, then take the MAX of those values to determine if the relationship is 1 or many on each side. For example:
WITH CTEA AS (
SELECT COUNT(DISTINCT B) ac
FROM t
GROUP BY A
),
CTEB AS (
SELECT COUNT(DISTINCT A) bc
FROM t
GROUP BY B
)
SELECT CONCAT(
CASE WHEN MAX(bc) = 1 THEN '1' ELSE 'many' END,
' to ',
CASE WHEN MAX(ac) = 1 THEN '1' ELSE 'many' END
) AS [A to B]
FROM CTEA
CROSS JOIN CTEB
Note that any time a relationship is listed as 1, it may actually be many but just not showing that because of limited data in the table.
Demo on dbfiddle
Assuming you have no NULL values:
select (case when count(*) = count(distinct a) and
count(*) = count(distinct b)
then '1-1'
when count(*) = count(distinct a) or
count(*) = count(distinct b)
then '1-many'
else 'many-many'
end)
from t;
Note: This does not distinguish between 1-many for a-->b or b-->a.
You would use count and group by to get this information.
--This would give you count of values of b which map to every values of a. If there is at least one row with a count give you a value greater than 1 it means the mapping between a and b is one to many.
select a,count( distinct b)
from table
group by a
If all of the rows have the values equal to one for all of the elements in a then the mapping is one-one
A caveat , null in b would be ignored in count expressions. ie because null and another null is not equivalent

Efficient way to verify a table is a subset of another table

I have two tables A and B, the structures are exactly the same. I need to verify A is a subset of B. Because the structure contains over 100 fields, I do not want to list them one by one in a where predicates.
I would like to know if there is any more easier way to do that
Assumptions:
(1) Identical table structure of A and B. This means that both order of columns and their data types have to match.
(2) There are no duplicate rows in table A
Problem description
To prove that A is a subset of B you need to show that A\B = empty set.
Solution
This means that if you remove every row in A that has a matching row in B and your output is empty (0 rows) this means that A is subset of B.
If on the other hand, in the output you get > 0 rows it means that A has rows that B doesnt and that A isn't a subset of B.
SELECT * FROM A
EXCEPT
SELECT * FROM B
When A is empty (contains 0 rows) it will be treated as a subset of B, because the result of above query will be 0 rows.
#robertoplancarte's approach with little tweaking
with tB_cnt as
(
SELECT COUNT(*) cnt FROM
(
SELECT DISTINCT * FROM dbo.T_B
) T_B
), TAB_cnt as
(
SELECT count(*) cnt FROM
(
SELECT * FROM dto.T_B
UNION
SELECT * FROM dto.T_A
) T_AB
)
SELECT
CASE WHEN TB_CNT.CNT = TAB_CNT.CNT THEN
'Table A is subset of B'
else
'Table A is not subset of B'
END as Result
FROM TAB_CNT, TB_CNT

How to select rows m through n in access?

I am modifying a query for a sub-report in Access 2016 and need to select a set of rows, not all rows. By default the query generated looks like:
SELECT table_name.a, table_name.b, table_name.c
FROM table_name
WHERE (((table_name.dist_ft)<3001));
How can I select only rows m through n instead of all rows?
Thanks for your insights! ... [edit]
An additional clarification - when I run the query like
SELECT TOP 16 *
FROM table_name
WHERE (((table_name.dist_ft)<3001));
... or any other variation I've tried with TOP my sub-report does not get populated. It only contains data when all fields are selected and TOP is not used. I must be missing something.
Is ID your "row number" here?
SELECT table_name.a, table_name.b, table_name.c
FROM table_name
WHERE table_name.dist_ft<3001
AND table_name.ID>=m
AND table_name.ID<=n
;
Updated with more general case based on comments-
Select table_name.a, table_name.b, table_name.c from tablename
where tablename.id in
(select top n tablename.id from tablename)
and tablename.id not in
(select top m tablename.id from tablenane)
Records m to n are the records 1 to n minus the records 1 to m-1. Be aware though that you need an ORDER BY clause for a TOP clause to make sense.
Here is an example with m = 31 to n = 40 and an order by all three selected columns. MS Access does not support EXCEPT so we cannot subtract the two data sets, which would be the straight-forward way to go. We could also express the desired result as the top n where (a,b,c) not in top m-1, but MS Access does not support an IN clause on multiple columns either. So I am using an anti join here (for which I select dist_ft, but it can be any non-nullable column of the table).
In case your table has a unique ID column, you can use a more readable where (id) not in (select top 30 id ...) instead of an anti join. In any way make sure to apply the same WHERE clause (dist_ft < 3001 in your case) and ORDER BY clause (e.g. ORDER BY a, b, c) to the main query and subquery.
SELECT TOP 40 a, b, c
FROM table_name t
LEFT JOIN
(
SELECT TOP 30 a, b, c, dist_ft
FROM table_name
WHERE dist_ft < 3001
ORDER BY a, b, c
) no ON no.a = t.a AND no.b = t.b AND no.c = t.c
WHERE t.dist_ft < 3001
AND no.dist_ft is null
ORDER BY t.a, t.b, t.c;
MS Access is known for requiring additional parentheses on multiple joins. I cannot say whether above query works straight away or if parantheses must be added somewhere.
You sort thrice to get the result you are after. Let's say we want rows 31 to 40:
Sort and get top 40
sort in reverse order and get top 10
sort again to get the order you actually want
The query:
SELECT a, b, c
FROM
SELECT TOP 10 a, b, c
FROM
(
SELECT TOP 40 a, b, c
FROM table_name
WHERE t.dist_ft < 3001
ORDER BY a, b, c
) top_n
ORDER BY a desc, b desc, c desc
) top_m_to_n
ORDER BY a, b, c;

Counting data with SQL

I have a table like the first table below (sorry if it doesn't show up correctly, I'm new to StackOverflow and haven't quite gotten the hang of how to show tables in the question). I've already received help do a count of IDs that are not duplicated (I don't mean a distinct count. A distinct count would return a result of 7 (a, b, c, d, e, f, g). I wanted it to return a count of 4 (a, c, d, f). These are the IDs that do not have multiple type codes). Now I need to take it a step further to show the count of how many times within a type code has a there is an ID with only that single type code. For example, we want to see a result like the second table below. There are 2 instances of IDs that have a single type code of 444 (c, f), there is one instance of an ID that has a single type code of 111 (a), and 222 (d).
For reference, the query that got me the count of IDs that have only one type code is
select count(*) from
(select id from
mytable
group by id
having count(*) =1) t
ID|type code
a|111
b|222
b|333
c|444
d|222
e|111
e|333
e|555
f|444
g|333
g|444
Type Code|Count
111|1
222|1
444|2
maybe this is what you're asking for?
SELECT [type code],
COUNT(*) [count]
FROM mytable
WHERE [ID] IN ( SELECT [ID]
FROM mytable
GROUP BY [ID]
HAVING COUNT([type code]) = 1)
GROUP BY [type code]
You can solve this using a nested aggregation:
SELECT type_code, COUNT(*)
FROM
( -- as this looks for a single row you can simply add the type code
SELECT ID, MIN(type_code) as type_code
FROM mytable
GROUP BY ID
HAVING COUNT(*) = 1
) AS dt
GROUP BY type_code