How can I find out the relationship between two columns in database? - sql

I have a view defined in SQL Server database and it has two columns A and B, both of which have the type of INT. I want to find out the relationship between these two, 1 to 1 or 1 to many or many to many. Is there a SQL statement I can use to find out?
For the relationship, it means for a given value of A, how many values of B maps to this value. If there is only one value, then it is 1 to 1 mapping.

You could use CTEs to generate COUNTs of how many distinct A values were associated with each B value and vice versa, then take the MAX of those values to determine if the relationship is 1 or many on each side. For example:
WITH CTEA AS (
SELECT COUNT(DISTINCT B) ac
FROM t
GROUP BY A
),
CTEB AS (
SELECT COUNT(DISTINCT A) bc
FROM t
GROUP BY B
)
SELECT CONCAT(
CASE WHEN MAX(bc) = 1 THEN '1' ELSE 'many' END,
' to ',
CASE WHEN MAX(ac) = 1 THEN '1' ELSE 'many' END
) AS [A to B]
FROM CTEA
CROSS JOIN CTEB
Note that any time a relationship is listed as 1, it may actually be many but just not showing that because of limited data in the table.
Demo on dbfiddle

Assuming you have no NULL values:
select (case when count(*) = count(distinct a) and
count(*) = count(distinct b)
then '1-1'
when count(*) = count(distinct a) or
count(*) = count(distinct b)
then '1-many'
else 'many-many'
end)
from t;
Note: This does not distinguish between 1-many for a-->b or b-->a.

You would use count and group by to get this information.
--This would give you count of values of b which map to every values of a. If there is at least one row with a count give you a value greater than 1 it means the mapping between a and b is one to many.
select a,count( distinct b)
from table
group by a
If all of the rows have the values equal to one for all of the elements in a then the mapping is one-one
A caveat , null in b would be ignored in count expressions. ie because null and another null is not equivalent

Related

Combine all rows in a column where most rows are null

I am writing a query trying to match true account IDs to incorrect account IDs across two tables using the following query:
SELECT DISTINCT
p.visitor_id,
CASE WHEN p.visitor_id = c.username then accountId else null end as correct_account_id,
CASE WHEN c.accountId is null then p.account_id else null end as incorrect_account_id
FROM `a_table` p
LEFT JOIN `another_table` c
ON p.account_id = c.accountID
and am getting this result (single vistor_id subset):
visitor_id
correct_account_id
incorrect_account_id
1
null
id
1
id
null
1
null
null
I would like to create one row per visitor_id where there are no null values and just the two ids are listed.
It sounds like you want:
SELECT p.visitor_id,
MAX(CASE WHEN p.visitor_id = c.username then accountId else null end) as correct_account_id,
MAX(CASE WHEN c.accountId is null then p.account_id else null end) as incorrect_account_id
FROM a_table p LEFT JOIN another_table c ON p.account_id = c.accountID
GROUP BY p.visitor_id
I removed the distinct and added the group by. Group by p.visitor_id means you want one row per visitor_id.
I wrapped your two CASE statements in MAX which means we want the maximum value (within the visitor_id grouping) there. Depending on your DB, you might need MIN instead (some DBs order nulls after other values by default, some order nulls after other values by default). Either way, the intention here is to find a non-null value for the column.
The big assumption here is that for each visitor_id, you have at most one correct_account_id and at most one incorrect_account_id that you care about. If you have more than one for the same visitor_id, this will only get one of them (the max/min). (Given that you explicitly say you want one row per visitor_id, this seems like a safe assumption.)

Efficient way to verify a table is a subset of another table

I have two tables A and B, the structures are exactly the same. I need to verify A is a subset of B. Because the structure contains over 100 fields, I do not want to list them one by one in a where predicates.
I would like to know if there is any more easier way to do that
Assumptions:
(1) Identical table structure of A and B. This means that both order of columns and their data types have to match.
(2) There are no duplicate rows in table A
Problem description
To prove that A is a subset of B you need to show that A\B = empty set.
Solution
This means that if you remove every row in A that has a matching row in B and your output is empty (0 rows) this means that A is subset of B.
If on the other hand, in the output you get > 0 rows it means that A has rows that B doesnt and that A isn't a subset of B.
SELECT * FROM A
EXCEPT
SELECT * FROM B
When A is empty (contains 0 rows) it will be treated as a subset of B, because the result of above query will be 0 rows.
#robertoplancarte's approach with little tweaking
with tB_cnt as
(
SELECT COUNT(*) cnt FROM
(
SELECT DISTINCT * FROM dbo.T_B
) T_B
), TAB_cnt as
(
SELECT count(*) cnt FROM
(
SELECT * FROM dto.T_B
UNION
SELECT * FROM dto.T_A
) T_AB
)
SELECT
CASE WHEN TB_CNT.CNT = TAB_CNT.CNT THEN
'Table A is subset of B'
else
'Table A is not subset of B'
END as Result
FROM TAB_CNT, TB_CNT

SQL query - Selecting distinct values from a table

I have a table in which i have multiple entries against a FK. I want to find out the value of FK which do not have certain entries e.g
my table has following entries.
PK----------------FK-----------------Column entries
1----------------100-----------------ab1
2----------------100-----------------ab2
3----------------100-----------------ab4
4----------------200-----------------ab1
5----------------200-----------------ab2
6----------------200-----------------ab3
7----------------300-----------------ab1
8----------------300-----------------ab2
9----------------300-----------------ab3
10---------------300-----------------ab4
Now, from this table i want to filter all those FK which do not have ab3 or ab4 in them. Certainly, i expect distinct values i.e. in this case result would be FK= 100 and 200.
The query which i am using is
select distinct(FK)
from table1
where column_entries != 'ab3'
or column_entries != 'ab4';
Certainly, this query is not fetching the desired result.
try the following :-
select distinct fk_col from table1
minus
(select distinct fk_col from table1 where col_entry='ab3'
intersect
select distinct fk_col from table1 where col_entry='ab4')
This will show all those FKs which do not have 'ab3' and 'ab4'. i.e. 100 and 200 in your case
The below script may be the solution if I got your question in a right way.
SELECT DISTINCT(TableForeignKey)
FROM Test
WHERE TableForeignKey NOT IN (
SELECT T1.TableForeignKey
FROM Test T1 INNER JOIN Test T2 ON T1.TableForeignKey = T2.TableForeignKey
WHERE T1.TableEntry = 'ab3' AND T2.TableEntry = 'ab4')
SQLFiddle Demo
You could use GROUP BY with conditional aggregation in HAVING:
SELECT FK
FROM table1
GROUP BY FK
HAVING COUNT(CASE column_entries WHEN 'ab3' THEN 1 END) = 0
OR COUNT(CASE column_entries WHEN 'ab4' THEN 1 END) = 0
;
The two conditional aggregates count 'ab3' and 'ab4' entries separately. If both end up with results greater than 0, then the corresponding FK has both 'ab3' and 'ab4' and is thus not returned. If at least one of the counts evaluates to 0, then FK is considered satisfying the requirements.

Find Rows where the Same Two Column Values Recur

Given a table in SQL-Server like:
Id INTEGER
A VARCHAR(50)
B VARCHAR(50)
-- Some other columns
with no index on A or B, I wish to find rows where a unique combination of A and B occurs more than once.
I'm using the query
SELECT A+B, Count(A+B) FROM MyTable
GROUP BY A+B
HAVING COUNT(A+B) > 1
First Question
Is there a more time-efficient way to do this? (I cannot add indices to the database)
Second Question
When I attempt to gain some formatting of the output by including a , in the concatenation:
SELECT A+','+B, Count(A+','+B) FROM MyTable
GROUP BY A+','+B
HAVING COUNT(A+','+B) > 1
The query fails with the error
Column 'MyDB.dbo.MyTable.A' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause
with a similar error for Column B.
How can I format the output to separate the two columns?
It would seem more natural to me to write:
SELECT A, B, Count(*) FROM MyTable
GROUP BY A, B
HAVING COUNT(*) > 1
And it's the most efficient way of doing it (and so is the query in the question).
Similarly to the above query, you can rewrite your second query:
SELECT A + ',' + B, Count(*) FROM MyTable
GROUP BY A, B
HAVING COUNT(*) > 1

Selecting a single (random) row for an SQL join

I've got an sql query that selects data from several tables, but I only want to match a single(randomly selected) row from another table.
Easier to show some code, I guess ;)
Table K is (k_id, selected)
Table C is (c_id, image)
Table S is (c_id, date)
Table M is (c_id, k_id, score)
All ID-columns are primary keys, with appropriate FK constraints.
What I want, in english, is for eack row in K that has selected = 1 to get a random row from C where there exists a row in M with (K_id, C_id), where the score is higher than a given value, and where c.image is not null and there is a row in s with c_id
Something like:
select k.k_id, c.c_id, m.score
from k,c,m,s
where k.selected = 1
and m.score > some_value
and m.k_id = k.k_id
and m.c_id = c.c_id
and c.image is not null
and s.c_id = c.c_id;
The only problem is this returns all the rows in C that match the criteria - I only want one...
I can see how to do it using PL/SQL to select all relevent rows into a collection and then select a random one, but I'm stuck as to how to select a random one.
you can use the 'order by dbms_random.random' instruction with your query.
i.e.:
SELECT column FROM
(
SELECT column FROM table
ORDER BY dbms_random.value
)
WHERE rownum = 1
References:
http://awads.net/wp/2005/08/09/order-by-no-order/
http://www.petefreitag.com/item/466.cfm
with analytics:
SELECT k_id, c_id, score
FROM (SELECT k.k_id, c.c_id, m.score,
row_number() over(PARTITION BY k.k_id ORDER BY NULL) rk
FROM k, c, m, s
WHERE k.selected = 1
AND m.score > some_value
AND m.k_id = k.k_id
AND m.c_id = c.c_id
AND c.image IS NOT NULL
AND s.c_id = c.c_id)
WHERE rk = 1
This will select one row that satisfies your criteria per k_id. This will likely select the same set of rows if you run the query several times. If you want more randomness (each run produces a different set of rows), you would replace ORDER BY NULL by ORDER BY dbms_random.value
I'm not too familiar with oracle SQL, but try using LIMIT random(), if there is such a function available.