SQL query - select uncommon values from 2 tables - sql

Today I was asked following question in my interview for a QA and because of incorrect query, I did not get selected. From then on, my mind is itching to get the correct answer for the following scenario:
I was given following 2 tables:
Tabel A | |Table B
--------- ----------
**ID** **ID**
-------- -----------
0 | | 5 |
1 | | 6 |
2 | | 7 |
3 | | 8 |
4 | | 9 |
5 | | 10|
6 | -----
----
And following output was expected using an SQL query:
**ID**
--------
| 0 |
| 1 |
| 2 |
| 3 |
| 4 |
| 7 |
| 8 |
| 9 |
| 10 |
--------
Thanks everyone, I really like this forum and from now on will be active here to learn more and more about SQL. I would like to make it my strong point rather a weak so as not to get kicked out of other interviews. I know there is a long way to go. However beside all of your responses, I came to draft the following query and would like to know from the experts here of their opinion about my query (and the reason why they think of what they think):
BTW the query has worked on MSSQLSRV-2008 (using Union or Union All, didn't matter to the result I got):
select ID from A where ID not in (5,6)
union
select ID from B where ID not in (5,6)
Is this really an efficient query?

If you want values in only one of two tables, I would use a full outer join and condition:
select coalesce(a.id, b.id)
from tableA a full outer join
tableB b
on a.id = b.id
where a.id is null or b.id is null;
Of course, if the job at a company that uses MS Access or MySQL, then this isn't the right answer, because these systems don't support full outer join. You can also do this in more complicated ways using union all and aggregation or even with other methods.
EDIT:
Here is another method:
select id
from (select a.id, 1 as isa, 0 as isb from tablea union all
select b.id, 0, 1 from tableb
) ab
group by id
having sum(isa) = 0 or sum(isb) = 0;
And another:
select id
from tablea
where a.id not in (select id from tableb)
union all
select id
from tableb
where b.id not in (select id from tablea);
As I think about this, it is a pretty good interview question (even though I've just given three reasonable answers).

Edit: See Gordon answer above for a better request, this is very inneficient way of doing what you want.
I think this should do the trick :
(SELECT * FROM A WHERE NOT id IN (SELECT A.id FROM A, B WHERE A.id = B.id))
UNION
(SELECT * FROM B WHERE NOT id IN (SELECT A.id FROM A, B WHERE A.id = B.id))
You could avoid the duplication of SELECT A.id ...by using a temporary table.

without full outer joins...
Select id
from (Select id from tableA
Union all
Select id from tableB) Z
group by id
Having count(*) = 1
or using Except and Intersect .....
(Select id from tableA Except Select id from tableB)
Union
(Select id from tableB Except Select id from tableA)
or ....
(Select id from tableA union Select id from tableB)
Except
(Select id from tableA intersect Select id from tableB)

Related

How to make a comparison for the record that has rows to another rows? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 3 years ago.
Improve this question
I have first table that has columns:
1. id
2. key
3. value
And second table(more like the list):
key
I need to get distinct id that contains all keys from second table
I have tried self join but it is very slow. Also I tried COUNT = COUNT but performance the same.
Self join:
select f.id from first
join first f2 on f.id = f2.id AND f2.key = f. key
COUNT:
select a.keyfrom #a a
where ( select SUM(CASE WHEN k.[key] is not NULL THEN 1 ELSE 0 END) from [b] b
LEFT JOIN Second s on s.key= b.[Key]
where b.[Key] = a.key) = #KeyCount
You can also check this-
SELECT A.id
FROM TAB1 A
INNER JOIN TAB2 B ON A.[key] = B.[Key]
GROUP BY A.id
HAVING COUNT(DISTINCT A.[key])
= (SELECT COUNT(DISTINCT [Key]) FROM TAB2)
This is somewhat of a stab in the dark, but perhaps this is what you're after...?
SELECT I.ID
FROM TableB B
CROSS APPLY (SELECT DISTINCT ca.ID
FROM dbo.TableA ca) I
LEFT JOIN TableA A ON B.[key] = A.[key]
AND I.ID = A.ID
GROUP BY I.ID
HAVING COUNT(CASE WHEN A.[Key] IS NULL THEN 1 END) = 0;
db<>fiddle
Assuming:
Your 2nd table lists all possible keys, and
Your first table (containing entity IDs, keys, and key values) can only contain 1 entity-key combination,
something like this may work:
SELECT [id], COUNT(*)
FROM Table1
GROUP BY [id]
HAVING COUNT(*) = (SELECT COUNT(*) FROM keys)
Now with some sample data. Assume the following keys:
+--------+----------+
| key_id | key_name |
+--------+----------+
| 1 | Key1 |
| 2 | Key2 |
| 3 | Key3 |
+--------+----------+
And the following entities:
+----+-----+-------+
| id | key | value |
+----+-----+-------+
| 1 | 1 | 1 |
| 1 | 2 | 2 |
| 1 | 3 | 3 |
| 2 | 2 | 2 |
| 2 | 3 | 3 |
+----+-----+-------+
Assume how Entity 1 has all keys, but Entity 2 is missing Key 1. So, as expected, the query returns only Entity 1.
You can use aggregation for the counting:
select f.id
from first f
where exists (select 1 from second s where s.key = f.key)
group by f.id
having count(*) = (select count(*) from second);
This assumes that there are no duplicates in the table. It also assumes that extra keys in first are ok. If not, use left join:
select f.id
from first f left join
second s
on s.key = f.key
group by f.id
having count(s.key) = (select count(*) from second) and
count(*) = count(s.key);

Comparing different columns in SQL for each row

after some transformation I have a result from a cross join (from table a and b) where I want to do some analysis on. The table for this looks like this:
+-----+------+------+------+------+-----+------+------+------+------+
| id | 10_1 | 10_2 | 11_1 | 11_2 | id | 10_1 | 10_2 | 11_1 | 11_2 |
+-----+------+------+------+------+-----+------+------+------+------+
| 111 | 1 | 0 | 1 | 0 | 222 | 1 | 0 | 1 | 0 |
| 111 | 1 | 0 | 1 | 0 | 333 | 0 | 0 | 0 | 0 |
| 111 | 1 | 0 | 1 | 0 | 444 | 1 | 0 | 1 | 1 |
| 112 | 0 | 1 | 1 | 0 | 222 | 1 | 0 | 1 | 0 |
+-----+------+------+------+------+-----+------+------+------+------+
The ids in the first column are different from the ids in the sixth column.
In a row are always two different IDs that are matched with each other. The other columns always have either 0 or 1 as a value.
I am now trying to find out how many values(meaning both have "1" in 10_1, 10_2 etc) two IDs have on average in common, but I don't really know how to do so.
I was trying something like this as a start:
SELECT SUM(CASE WHEN a.10_1 = 1 AND b.10_1 = 1 then 1 end)
But this would obviously only count how often two ids have 10_1 in common. I could make something like this for example for different columns:
SELECT SUM(CASE WHEN (a.10_1 = 1 AND b.10_1 = 1)
OR (a.10_2 = 1 AND b.10_1 = 1) OR [...] then 1 end)
To count in general how often two IDs have one thing in common, but this would of course also count if they have two or more things in common. Plus, I would also like to know how often two IDS have two things, three things etc in common.
One "problem" in my case is also that I have like ~30 columns I want to look at, so I can hardly write down for each case every possible combination.
Does anyone know how I can approach my problem in a better way?
Thanks in advance.
Edit:
A possible result could look like this:
+-----------+---------+
| in_common | count |
+-----------+---------+
| 0 | 100 |
| 1 | 500 |
| 2 | 1500 |
| 3 | 5000 |
| 4 | 3000 |
+-----------+---------+
With the codes as column names, you're going to have to write some code that explicitly references each column name. To keep that to a minimum, you could write those references in a single union statement that normalizes the data, such as:
select id, '10_1' where "10_1" = 1
union
select id, '10_2' where "10_2" = 1
union
select id, '11_1' where "11_1" = 1
union
select id, '11_2' where "11_2" = 1;
This needs to be modified to include whatever additional columns you need to link up different IDs. For the purpose of this illustration, I assume the following data model
create table p (
id integer not null primary key,
sex character(1) not null,
age integer not null
);
create table t1 (
id integer not null,
code character varying(4) not null,
constraint pk_t1 primary key (id, code)
);
Though your data evidently does not currently resemble this structure, normalizing your data into a form like this would allow you to apply the following solution to summarize your data in the desired form.
select
in_common,
count(*) as count
from (
select
count(*) as in_common
from (
select
a.id as a_id, a.code,
b.id as b_id, b.code
from
(select p.*, t1.code
from p left join t1 on p.id=t1.id
) as a
inner join (select p.*, t1.code
from p left join t1 on p.id=t1.id
) as b on b.sex <> a.sex and b.age between a.age-10 and a.age+10
where
a.id < b.id
and a.code = b.code
) as c
group by
a_id, b_id
) as summ
group by
in_common;
The proposed solution requires first to take one step back from the cross-join table, as the identical column names are super annoying. Instead, we take the ids from the two tables and put them in a temporary table. The following query gets the result wanted in the question. It assumes table_a and table_b from the question are the same and called tbl, but this assumption is not needed and tbl can be replaced by table_a and table_b in the two sub-SELECT queries. It looks complicated and uses the JSON trick to flatten the columns, but it works here:
WITH idtable AS (
SELECT a.id as id_1, b.id as id_2 FROM
-- put cross join of table a and table b here
)
SELECT in_common,
count(*)
FROM
(SELECT idtable.*,
sum(CASE
WHEN meltedR.value::text=meltedL.value::text THEN 1
ELSE 0
END) AS in_common
FROM idtable
JOIN
(SELECT tbl.id,
b.*
FROM tbl, -- change here to table_a
json_each(row_to_json(tbl)) b -- and here too
WHERE KEY<>'id' ) meltedL ON (idtable.id_1 = meltedL.id)
JOIN
(SELECT tbl.id,
b.*
FROM tbl, -- change here to table_b
json_each(row_to_json(tbl)) b -- and here too
WHERE KEY<>'id' ) meltedR ON (idtable.id_2 = meltedR.id
AND meltedL.key = meltedR.key)
GROUP BY idtable.id_1,
idtable.id_2) tt
GROUP BY in_common ORDER BY in_common;
The output here looks like this:
in_common | count
-----------+-------
2 | 2
3 | 1
4 | 1
(3 rows)

Get count of foreign key from multiple tables

I have 3 tables, with Table B & C referencing Table A via Foreign Key. I want to write a query in PostgreSQL to get all ids from A and also their total occurrences from B & C.
a | b | c
-----------------------------------
id | txt | id | a_id | id | a_id
---+---- | ---+----- | ---+------
1 | a | 1 | 1 | 1 | 3
2 | b | 2 | 1 | 2 | 4
3 | c | 3 | 3 | 3 | 4
4 | d | 4 | 4 | 4 | 4
Output desired (just the id from A & total count in B & C) :
id | Count
---+-------
1 | 2 -- twice in B
2 | 0 -- occurs nowhere
3 | 2 -- once in B & once in C
4 | 4 -- once in B & thrice in C
SQL so far SQL Fiddle :
SELECT a_id, COUNT(a_id)
FROM
( SELECT a_id FROM b
UNION ALL
SELECT a_id FROM c
) AS union_table
GROUP BY a_id
The query I wrote fetches from B & C and counts the occurrences. But if the key doesn't occur in B or C, it doesn't show up in the output (e.g. id=2 in output). How can I start my selection from table A & join/union B & C to get the desired output
If the query involves large parts of b and / or c it is more efficient to aggregate first and join later.
I expect these two variants to be considerably faster:
SELECT a.id,
, COALESCE(b.ct, 0) + COALESCE(c.ct, 0) AS bc_ct
FROM a
LEFT JOIN (SELECT a_id, count(*) AS ct FROM b GROUP BY 1) b USING (a_id)
LEFT JOIN (SELECT a_id, count(*) AS ct FROM c GROUP BY 1) c USING (a_id);
You need to account for the possibility that some a_id are not present at all in a and / or b. count() never returns NULL, but that's cold comfort in the face of LEFT JOIN, which leaves you with NULL values for missing rows nonetheless. You must prepare for NULL. Use COALESCE().
Or UNION ALL a_id from both tables, aggregate, then JOIN:
SELECT a.id
, COALESCE(ct.bc_ct, 0) AS bc_ct
FROM a
LEFT JOIN (
SELECT a_id, count(*) AS bc_ct
FROM (
SELECT a_id FROM b
UNION ALL
SELECT a_id FROM c
) bc
GROUP BY 1
) ct USING (a_id);
Probably slower. But still faster than solutions presented so far. And you could do without COALESCE() and still not loose any rows. You might get occasional NULL values for bc_ct, in this case.
Another option:
SELECT
a.id,
(SELECT COUNT(*) FROM b WHERE b.a_id = a.id) +
(SELECT COUNT(*) FROM c WHERE c.a_id = a.id)
FROM
a
Use left join with a subquery:
SELECT a.id, COUNT(x.id)
FROM a
LEFT JOIN (
SELECT id, a_id FROM b
UNION ALL
SELECT id, a_id FROM c
) x ON (a.id = x.a_id)
GROUP BY a.id;

TSQL - retrieve results from table A that contains exact data contained in table B

I have TableA (id bigint, name varchar) and TableB (name varchar) that contains the following data:
Table A: Table B: Results:
------------- --------- ---------------
| 1 | "A" | | "A" | | 1 | "A" |
| 1 | "B" | | "B" | | 1 | "B" |
| 2 | "A" | --------- | 4 | "A" |
| 3 | "B" | | 4 | "B" |
| 4 | "A" | ---------------
| 4 | "B" |
-------------
I want to return results from TableA that contains an EXACT match of what's in table B.
Using the 'IN' clause only retrieves back an occurrence.
Also, another example, if TableB has only "A", I want it to return back: 2-"A"
I understand your question but it is a tricky one as not exactly in line with the relational logic. You are looking for id's for which SELECT name FROM TableA WHERE id IN ... ORDER BY name; is identical to SELECT name FROM B order by name;.
Can you assume that A(id,name) is unique and B(name) is unique? Better said, are there constraints like that or can you set them up?
If yes, here is a solution:
1. Get rid of ids in A with rows not matching the rows in B
SELECT id, A.name FROM A WHERE id NOT IN
(SELECT id FROM A LEFT JOIN B ON A.name = B.name WHERE B.name IS NULL);
2. Count rows per each id (this is why the unique constraints are necessary)
SELECT id, COUNT(*) FROM
(
SELECT id, A.name FROM A WHERE id NOT IN
(SELECT id FROM A LEFT JOIN B ON A.name = B.name WHERE B.name IS NULL)
) t
GROUP BY id;
3. Only retain those that match the number of rows of B.
SELECT id, COUNT(*) FROM
(
SELECT id, A.name FROM A WHERE id NOT IN
(SELECT id FROM A LEFT JOIN B ON A.name = B.name WHERE B.name IS NULL)
) t
GROUP BY id
HAVING COUNT(*) = SELECT COUNT(*) FROM B;
This works in SQL Server
select * from TableA a
where
(select count(*) from TableB) = (select count(*) from TableA where id = a.id) and
(select count(*) from TableB) =
(
select count(*) from
(
select name from TableA where id = a.id
intersect
select name from TableB
) as b
)

SQL Query - Displaying the same column twice under different conditions

I am wondering if the following is possible. Say I have the following table:
ID | NAME
1 | John
2 | Bob
3 | John
4 | Bob
Is it possible to run a query that results in the following:
NAME| ID1 | ID2
John | 1 | 3
Bob | 2 | 4
EDIT
Sorry for the confusion. My question addresses instances where I need to handle the possibility of 2 duplicates for a large data set.
Assuming exactly 2 duplicates
SELECT
NAME,
MIN(ID) as ID1,
MAX(ID) as ID2
FROM Table t
GROUP BY NAME
This should work. Note that the subquery screens out all names that don't have exactly two ids.
select name,min(id) as id1,max(id) as id2
from table
join(
select name
from table
group by name
having count(1)=2
)names
using(name)
group by name;
If there are exactly two rows with each name, then the following should work:
SELECT a.name,
a.id as id1,
b.id as id2
FROM the_table a
JOIN the_table b ON a.name = b.name AND a.id <> b.id