Find values where related must have list of values - sql

I'm trying to find a simple solution for my SQL Server problem.
I have two tables look like this:
table1
--id
-- data
table2
--id
--table1_id
--value
I have some records like this:
Table1
+-----------------------+
| id | data |
+-----------------------+
| 1 | ? |
+-----------------------+
| 2 | ? |
+-----------------------+
Table2
+-----------------------+
|id | table1_id | value |
+-----------------------+
| 1 | 1 | 'a' |
+-----------------------+
| 2 | 1 | 'b' |
+-----------------------+
| 3 | 2 | 'a' |
+-----------------------+
Now I want to get table1 with all it's additional values where the relation to table2 has 'a' AND 'b' as values.
So I would get the id 1 of table1.
Currently I have an query like this:
SELECT t1.[id], t1.[data]
FROM [table1] t1,
(SELECT [id]
FROM [table1] t1
JOIN [table2] t2 ON t1.[id] = t2.[table1_id] AND t2.[Value] IN('a', 'b')
GROUP BY t1[id]
HAVING COUNT(t2.[Value]) = 2) x
WHERE t1.id = x.id
Has anyone an idea on how to achieve my goal in a simpler way?

One way uses exists:
select t1.*
from table1 t1
where exists (select 1
from table2 t2
where t2.table1_id = t1.id and t2.value = 'a'
) and
exists (select 1
from table2 t2
where t2.table1_id = t1.id and t2.value = 'b'
);
This can take advantage of an index on table2(table1_id, value).
You could also write:
select t1.*
from table1 t1
where (select count(distinct t2.value)
from table2 t2
where t2.table1_id = t1.id and t2.value in ('a', 'b')
) = 2 ;
This would probably also have very good performance with the index, if table2 doesn't have duplicates.

SELECT T1.[id], T1.[data]
FROM table1 AS T1
JOIN table2 AS T2
ON T1.[id]=T2.[table1_id]
JOIN table2 AS T3
ON T1.[id]=T3.[table1_id]
WHERE
T2.[Value] ='a'
AND T3.[Value] = 'b'
As Gordon Linoff suggested, exists clause usage works as well and could be performance efficient depending on the data you are playing with.

you have to do several steps to solve the problem:
established which records are related to table 1 and table 2 and which of these are of value (A or B) and eliminate the repeated ones with the group by(InfoRelationate )
validate that only those related to a and b were allowed by means of a count in the table above (ValidateAYB)
see what data meets the condition of table1 and table 2 and joined table 1
this query meets the conditions
with InfoRelationate as
(
select Table2.table1_id,value
from Table2 inner join
Table1 on Table2.table1_id=Table1.id and Table2.value IN('a', 'b')
group by Table2.table1_id,value
),
ValidateAYB as
(
select InfoRelationate.table1_id
from InfoRelationate
group by InfoRelationate.table1_id
having count (1)=2
)
select InfoRelationate.table1_id,InfoRelationate.value
from InfoRelationate
inner join ValidateAYB on InfoRelationate.table1_id=ValidateAYB.table1_id
union all
select id,data
from Table1
Example code

Related

Using the same table alias twice in a query

My coworker, who is new to ANSI join syntax, recently wrote a query like this:
SELECT count(*)
FROM table1 t1
JOIN table2 t2 ON
(t1.col_a = t2.col_a)
JOIN table3 t3 ON
(t2.col_b = t3.col_b)
JOIN table3 t3 ON
(t3.col_c = t1.col_c);
Note that table3 is joined to both table1 and table2 on different columns, but the two JOIN clauses use the same table alias for table3.
The query runs, but I'm unsure of it's validity. Is this a valid way of writing this query?
I thought the join should be like this:
SELECT count(*)
FROM table1 t1
JOIN table2 t2 ON
(t1.col_a = t2.col_a)
JOIN table3 t3 ON
(t2.col_b = t3.col_b AND
t3.col_c = t1.col_c);
Are the two versions functionally identical? I don't really have enough data in our database yet to be sure.
Thanks.
The first query is a join of 4 tables, the second one is a join of 3 tables. So I don't expect that both queries return the same numbers of rows.
SELECT *
FROM table1 t1
JOIN table2 t2 ON
(t1.col_a = t2.col_a)
JOIN table3 t3 ON
(t2.col_b = t3.col_b)
JOIN table3 t3 ON
(t3.col_c = t1.col_c);
The alias t3 is only used in the ON clause. The alias t3 refers to the table before the ON keyword. I found this out by experimenting. So the pervious query is equvivalent to
SELECT *
FROM table1 t1
JOIN table2 t2 ON
(t1.col_a = t2.col_a)
JOIN table3 t3 ON
(t2.col_b = t3.col_b)
JOIN table3 t4 ON
(t4.col_c = t1.col_c);
and this can be transfotmed in a traditional join
SELECT *
FROM table1 t1,
table2 t2,
table3 t3,
table3 t4
where (t1.col_a = t2.col_a)
and (t2.col_b = t3.col_b)
and (t4.col_c = t1.col_c);
The second query is
SELECT *
FROM table1 t1
JOIN table2 t2 ON
(t1.col_a = t2.col_a)
JOIN table3 t3 ON
(t2.col_b = t3.col_b AND
t3.col_c = t1.col_c);
This can also transformed in a traditional join
SELECT *
FROM table1 t1,
table2 t2,
table3 t3
where (t1.col_a = t2.col_a)
and (t2.col_b = t3.col_b)
AND (t3.col_c = t1.col_c);
These queries seem to be different. To proof their difference we use the following example:
create table table1(
col_a number,
col_c number
);
create table table2(
col_a number,
col_b number
);
create table table3(
col_b number,
col_c number
);
insert into table1(col_a, col_c) values(1,3);
insert into table1(col_a, col_c) values(4,3);
insert into table2(col_a, col_b) values(1,2);
insert into table2(col_a, col_b) values(4,2);
insert into table3(col_b, col_c) values(2,3);
insert into table3(col_b, col_c) values(2,5);
insert into table3(col_b, col_c) values(7,9);
commit;
We get the following output
SELECT *
FROM table1 t1
JOIN table2 t2 ON
(t1.col_a = t2.col_a)
JOIN table3 t3 ON
(t2.col_b = t3.col_b)
JOIN table3 t3 ON
(t3.col_c = t1.col_c)
| COL_A | COL_C | COL_A | COL_B | COL_B | COL_C | COL_B | COL_C |
|-------|-------|-------|-------|-------|-------|-------|-------|
| 1 | 3 | 1 | 2 | 2 | 3 | 2 | 3 |
| 4 | 3 | 4 | 2 | 2 | 3 | 2 | 3 |
| 1 | 3 | 1 | 2 | 2 | 5 | 2 | 3 |
| 4 | 3 | 4 | 2 | 2 | 5 | 2 | 3 |
SELECT *
FROM table1 t1
JOIN table2 t2 ON
(t1.col_a = t2.col_a)
JOIN table3 t3 ON
(t2.col_b = t3.col_b AND
t3.col_c = t1.col_c)
| COL_A | COL_C | COL_A | COL_B | COL_B | COL_C |
|-------|-------|-------|-------|-------|-------|
| 4 | 3 | 4 | 2 | 2 | 3 |
| 1 | 3 | 1 | 2 | 2 | 3 |
The number of rows retrieved is different and so count(*) is different.
The usage of the aliases was surprising. at least for me.
The following query works because t1 in the where_clause references table2.
select *
from table1 t1 join table2 t1 on(1=1)
where t1.col_b<0;
The following query works because t1 in the where_clause references table1.
select *
from table1 t1 join table2 t1 on(1=1)
where t1.col_c<0;
The following query raises an error because both table1 and table2 contain a column col_a.
select *
from table1 t1 join table2 t1 on(1=1)
where t1.col_a<0;
The error thrown is
ORA-00918: column ambiguously defined
The following query works, the alias t1 refers to two different tables in the same where_clause.
select *
from table1 t1 join table2 t1 on(1=1)
where t1.col_b<0 and t1.col_c<0;
These and more examples can be found here: http://sqlfiddle.com/#!4/84feb/12
The smallest counter example
The smallest counter example is
table1
col_a col_c
1 2
table2
col_a col_b
1 3
table3
col_b col_c
3 5
6 2
Here the second query has an empty result set and the first query returns one row. It can be shown that the count(*) of the second query never exeeds the count(*)of the first query.
A more detailed explanation
This behaviour will became more clear if we analyze the following statement in detail.
SELECT t.col_b, t.col_c
FROM table1 t
JOIN table2 t ON
(t.col_b = t.col_c) ;
Here is the reduced syntax for this query in Backus–Naur form derived from the syntax descriptions in the SQL Language Reference of Oracle 12.2. Note that under each syntax diagram there is a link to the Backus–Naur form of this diagram, e.g Description of the illustration select.eps. "reduced" means that I left out all the possibilities that where not used, e,g. the select is defined as
select::=subquery [ for_update_clause ] ;
Our query does not use the optional for_update_clause, so I reduced the rule to
select::=subquery
The only exemption is the optional where-clause. I didn't remove it so that this reduced rules can be used to analyze the above query even if we add a where_clause.
These reduced rule will define only a subset of all possible select statements.
select::=subquery
subquery::=query_block
query_block::=SELECT select_list FROM join_clause [ where_clause ]
join_clause::=table_reference inner_cross_join_clause ...
table_reference::=query_table_expression t_alias query_table_expression::=table
inner_cross_join_clause::=JOIN table_reference ON condition
So our select statement is a query_block and the join_clause is of type
table_reference inner_cross_join_clause
where table_reference is table1 t and inner_cross_join_clause is JOIN table2 t ON (t.col_b = t.col_c). The ellipsis ... means that there could be additional inner_cross_join_clauses, but we do not need this here.
in the inner_cross_join_clause the alias t refers to table2. Only if these references cannot be satisfied the aliasmust be searched in an outer scope. So all the following expressions in the ONcondition are valid:
t.col_b = t.col_c
Here t.col_b is table2.col_b because t refers to the alias of its inner_cross_join_clause, t.col_c is table1.col_c. t of the inner_cross_join_clause (refering to table2) has no column col_c so the outer scope will be searched and an appropriate alias will be found.
If we have the clause
t.col_a = t.col_a
the alias can be found as alias defined in the inner_cross_join_clause to which this ON-condition belongs so t will be resolved to table2.
if the select list consists of
t.col_c, t.col_b, t.col_a
instead of * then the join_clause will be searched for an alias and t.col_c will be resolved to table1.col_c (table2 does not contain a column col_c), t.col_b will be resolved to table2.col_b (table1 does not contain a col_b) but t.col_a will raise the error
ORA-00918: column ambiguously defined
because for the select_list none of the aias definition has a precedenve over the other. If our query also has a where_clause then the aliases are resolved in the same way as if they are used in the select_list.
With more data, it will produce different results.
Your colleagues query is same as this.
select * from table3 where t3.col_b = 'XX'
union
select * from table3 where t3.col_c = 'YY'
or
select * from table3 where t3.col_b = 'XX' or t3.col_c = 'YY'
while your query is like this.
select * from table3 where t3.col_b ='XX' and t3.col_c='YY'
First one is like data where (xx or yy) while second one is data where ( xx and yy)

how can i do the following query with Oracle SQL?

------------------
| **table 1** |
------------------
| 1 | 400 |
| 2 | 220 |
| 3 | 123 |
------------------
| **table 2** |
------------------
| 1 | 100 |
formula : table1 - table2 where table1.id=table2.id
------------------
| **Result** |
------------------
| 1 | 300 |
| 2 | 220 |
| 3 | 123 |
You want an outer join to get all rows from table_1 and the matching ones from table2
select t1.id, t1.val - coalesce(t2.val, 0) as result
from table_1 t1
left join table_2 t2 on t1.id = t2.id;
The coalesce(t2.val, 0) is necessary because the outer join will return null for those rows where no id exists in table_2 but t1.val - null would yield null
select t1.id,
nvl2(t2.val,t1.val-t2.val,t1.val) val
from t1,t2
where t1.id=t2.id(+)
order by t1.id;
Try this
select t1.col1, t1.col2-t2.col1 as balance from
table1 t1 left join table2 t2 on t1.col1=t2.col1
I don't the syntax in Oracle sql, but I can give the solution in mysql.
Consider the table with 2 columns:
id , value
SELECT table1.id, table1.value - table2.value
FROM table1, table2
WHERE table1.id=table2.id
OR
SELECT table1.id, table1.value
FROM table1, table2
WHERE NOT (table1.id =table2.id)
In some cases using scalar subquery caching could give better performance. It is on developer to compare execution plans and decide which query is the most appropriate.
with t1 (id, num) as
(
select 1, 400 from dual union all
select 2, 220 from dual union all
select 3, 123 from dual
),
t2(id, num) as
(
select 1, 100 from dual
)
select id,
num - nvl((select num from t2 where t2.id = t1.id), 0) result
from t1;
This is just to show you a different technique for solving problems in which you try to get data from several tables, but some may not have matching rows.
Using outer join in this case is in my opinion more logical.

How to combine tables of different columns into 1 result set and make all attributes of same pk in 1 row with 1 SQL statement?

I got a problems in combining tables
I have 2 table:
t1:
id | name
----- ---------
1 | 'foo'
2 | 'bar'
t2:
id | type
------ ---------
1 | 'type1'
3 | 'type2'
I want to combine those tables into 1 result set and make all attributes of same primary key in 1 row. And with single SQL statement in Oracle. The primary key column with the same name (id in the sample) can't appear twice
The result should be:
id | name | type
----- --------- ---------
1 | 'foo' | 'type1'
2 | 'bar' | null
3 | null | 'type2'
Thanks in advance for any ideas and responses.
Update:
I tried Ani Menon's out join statement, but not 100% the expected result. The outer join gives null value if id exists in t1 but not in t2.
SELECT t1.id,t1.name,t2.type
FROM t1
FULL OUTER JOIN t2 ON t1.id=t2.id;
Returns
id | name | type
----- --------- ---------
1 | 'foo' | 'type1'
2 | 'bar' | null
null | null | 'type2'
Do a full outer join:
SELECT table1.id,table1.name,table2.type
FROM table1
FULL OUTER JOIN table2 ON table1.id=table2.id;
Edit:
Use coalesce(table1.id,table2.id) in place of table1.id in the query.
Similar to Ani's answer, but won't give you the null id:
select coalesce(table1.id, table2.id) as id, table1.name, table2.type
from table1 full outer join table2 on table1.id = table2.id;
might now be the optimal solution but this will work.
select allIDs.id, t1.name, t2.type
(select id from t1
union
select id from t2) allIDs left outer join t1 on allIDs.id = t1.id left outer join t2 on allIDs.id = t2.id
tested in sql server worked.
create table t1 (
id int,
name varchar(25)
)
create table t2
(
id int,
type varchar(25)
)
insert into t1
values(1, 'fool'),
(2,'bar')
insert into t2
values(1,'type1'),
(3,'type2')
select id, MAX(name) name, MAX(type) type from(
select id, name, null type from t1
union all
select id, null name, type from t2) combine group by id
SELECT t1.id,t1.name,t2.type
FROM t1
LEFT OUTER JOIN t2 ON t1.id=t2.id
union
select t2.id, t1.name, t2.type
from t2
left outer join t1 on t2.id = t1.id

SQL how to simulate an xor?

I'm wondering if anybody can help me solve this question I got at a job interview. Let's say I have two tables like:
table1 table2
------------ -------------
id | name id | name
------------ -------------
1 | alpha 1 | alpha
3 | charlie 3 | charlie
4 | delta 5 | echo
8 | hotel 7 | golf
9 | india
The question was to write a SQL query that would return all the rows that are in either table1 or table2 but not both, i.e.:
result
------------
id | name
------------
4 | delta
5 | echo
7 | golf
8 | hotel
9 | india
I thought I could do something like a full outer join:
SELECT table1.*, table2.*
FROM table1 FULL OUTER JOIN table2
ON table1.id=table2.id
WHERE table1.id IS NULL or table2.id IS NULL
but that gives me a syntax error on SQL Fiddle (I don't think it supports the FULL OUTER JOIN syntax). Other than that, I can't even figure out a way to just concatenate the rows of the two tables, let alone filtering out rows that appear in both. Can somebody enlighten me and tell me how to do this? Thanks.
Well, you could use UNION instead of OUTER JOIN.
SELECT * FROM table1 t1
LEFT JOIN table2 t2 ON t1.id = t2.id
UNION
SELECT * FROM table1 t1
RIGHT JOIN table2 t2 ON t1.id = t2.id
Here's a little trick I know: not equals is the same as XOR, so you could have your WHERE clause something like this:
WHERE ( table1.id IS NULL ) != ( table2.id IS NULL )
select id,name--,COUNT(*)
from(
select id,name from table1
union all
select id,name from table2
) x
group by id,name
having COUNT(*)=1
I'm sure there are lots of solutions, but the first thing that comes to mind for me is to union all the two tables, then group by name, filter with a having clause on the count.
(
SELECT * FROM TABLE1
EXCEPT
SELECT * FROM TABLE2
)
UNION ALL
(
SELECT * FROM TABLE2
EXCEPT
SELECT * FROM TABLE1
)
This should work on most database servers
SELECT id, name
FROM table1
WHERE NOT EXISTS(SELECT NULL FROM table2 WHERE table1.id = table2.id AND table1.name = table2.name)
UNION ALL
SELECT id, name
FROM table2
WHERE NOT EXISTS(SELECT NULL FROM table1 WHERE table1.id = table2.id AND table1.name = table2.name)

SQL - Select not repeated rows from 2 tables?

I have 2 tables (perhaps they are badly built).
table1
id | word | user
1 | a | me
2 | b | dad
3 | c | mom
4 | d | sister
table2
id | word | user
1 | a | me
2 | b | dad
I want to show all rows from table1 excluding the rows from table2 which are equal to table1. In this case, the select must display row 3 and 4 from table.
Thanks.
Try this
Select * from Table1
Except
Select * from Table2
You did not specify what RDBMS but you can use NOT EXISTS in all databases:
select *
from table1 t1
where not exists (select *
from table2 t2
where t1.word = t2.word
and t1.user = t2.user
-- add other columns here for comparison including id)
See SQL Fiddle with Demo
Like so:
SELECT *
FROM Table1
WHERE id NOT IN(SELECT id FROM Table2);
Predicate NOT IN Fiddle Demo
Or: using a LEFT JOIN like so:
SELECT t1.*
FROM table1 t1
LEFT JOIN table2 t2 ON t1.id = t2.id
WHERE t2.id IS NULL;
LEFT JOIN Fiddle Demo
You can use EXCEPT (SQL-Server >= 2005)
SELECT id, word, user
FROM Table1
EXCEPT
SELECT id, word, user
FROM Table2;
DEMO
As you did not specify what flavour of SQL you are using, it is probably wise to steer clear of EXCEPTS and use standard ANSI SQL. So this is a case for using a left outer join.
SELECT t1.*
FROM table1 AS t1
LEFT OUTER JOIN table2 AS t2
ON t1.word = t2.word
AND t1.user = t2.user
WHERE t2.id IS NULL