Column does not exist in the IN clause, but SQL runs - sql

I have a query that uses the IN clause. Here's a simplified version:
SELECT *
FROM table A
JOIN table B
ON A.ID = B.ID
WHERE B.AnotherColumn IN (SELECT Column FROM tableC WHERE ID = 1)
tableC doesn't have a Column column, but the query executes just fine with no error message. Can anyone explain why?

This will work if a table in the outer query has a column of that name. This is because column names from the outer query are available to the subquery, and you could be deliberately meaning to select an outer query column in your subquery SELECT list.
For example:
CREATE TABLE #test_main (colA integer)
CREATE TABLE #test_sub (colB integer)
-- Works, because colA is available to the sub-query from the outer query. However,
-- it's probably not what you intended to do:
SELECT * FROM #test_main WHERE colA IN (SELECT colA FROM #test_sub)
-- Doesn't work, because colC is nowhere in either query
SELECT * FROM #test_main WHERE colA IN (SELECT colC FROM #test_sub)
As Damien observes, the safest way to protect yourself from this none-too-obvious "gotcha" is to get into the habit of qualifying your column names in the subquery:
-- Doesn't work, because colA is not in table #test_sub, so at least you get
-- notified that what you were trying to do doesn't make sense.
SELECT * FROM #test_main WHERE colA IN (SELECT #test_sub.colA FROM #test_sub)

If you want to avoid this situation in the future (that Matt Gibson has explained), it's worth getting into the habit of always using aliases to specify columns. E.g.:
SELECT *
FROM table A
JOIN table B
ON A.ID = B.ID
WHERE B.AnotherColumn IN (SELECT C.Column FROM tableC C WHERE C.ID = 1)
This would have given you a nice error message (note I also specified the alias in the where clause - if there wasn't an ID column in tableC, you'd have also had additional problems)

Related

Cross joining tables to see which partners in one table have a report from another table [duplicate]

table1 (id, name)
table2 (id, name)
Query:
SELECT name
FROM table2
-- that are not in table1 already
SELECT t1.name
FROM table1 t1
LEFT JOIN table2 t2 ON t2.name = t1.name
WHERE t2.name IS NULL
Q: What is happening here?
A: Conceptually, we select all rows from table1 and for each row we attempt to find a row in table2 with the same value for the name column. If there is no such row, we just leave the table2 portion of our result empty for that row. Then we constrain our selection by picking only those rows in the result where the matching row does not exist. Finally, We ignore all fields from our result except for the name column (the one we are sure that exists, from table1).
While it may not be the most performant method possible in all cases, it should work in basically every database engine ever that attempts to implement ANSI 92 SQL
You can either do
SELECT name
FROM table2
WHERE name NOT IN
(SELECT name
FROM table1)
or
SELECT name
FROM table2
WHERE NOT EXISTS
(SELECT *
FROM table1
WHERE table1.name = table2.name)
See this question for 3 techniques to accomplish this
I don't have enough rep points to vote up froadie's answer. But I have to disagree with the comments on Kris's answer. The following answer:
SELECT name
FROM table2
WHERE name NOT IN
(SELECT name
FROM table1)
Is FAR more efficient in practice. I don't know why, but I'm running it against 800k+ records and the difference is tremendous with the advantage given to the 2nd answer posted above. Just my $0.02.
SELECT <column_list>
FROM TABLEA a
LEFTJOIN TABLEB b
ON a.Key = b.Key
WHERE b.Key IS NULL;
https://www.cloudways.com/blog/how-to-join-two-tables-mysql/
This is pure set theory which you can achieve with the minus operation.
select id, name from table1
minus
select id, name from table2
Here's what worked best for me.
SELECT *
FROM #T1
EXCEPT
SELECT a.*
FROM #T1 a
JOIN #T2 b ON a.ID = b.ID
This was more than twice as fast as any other method I tried.
Watch out for pitfalls. If the field Name in Table1 contain Nulls you are in for surprises.
Better is:
SELECT name
FROM table2
WHERE name NOT IN
(SELECT ISNULL(name ,'')
FROM table1)
You can use EXCEPT in mssql or MINUS in oracle, they are identical according to :
http://blog.sqlauthority.com/2008/08/07/sql-server-except-clause-in-sql-server-is-similar-to-minus-clause-in-oracle/
That work sharp for me
SELECT *
FROM [dbo].[table1] t1
LEFT JOIN [dbo].[table2] t2 ON t1.[t1_ID] = t2.[t2_ID]
WHERE t2.[t2_ID] IS NULL
You can use following query structure :
SELECT t1.name FROM table1 t1 JOIN table2 t2 ON t2.fk_id != t1.id;
table1 :
id
name
1
Amit
2
Sagar
table2 :
id
fk_id
email
1
1
amit#ma.com
Output:
name
Sagar
All the above queries are incredibly slow on big tables. A change of strategy is needed. Here there is the code I used for a DB of mine, you can transliterate changing the fields and table names.
This is the strategy: you create two implicit temporary tables and make a union of them.
The first temporary table comes from a selection of all the rows of the first original table the fields of which you wanna control that are NOT present in the second original table.
The second implicit temporary table contains all the rows of the two original tables that have a match on identical values of the column/field you wanna control.
The result of the union is a table that has more than one row with the same control field value in case there is a match for that value on the two original tables (one coming from the first select, the second coming from the second select) and just one row with the control column value in case of the value of the first original table not matching any value of the second original table.
You group and count. When the count is 1 there is not match and, finally, you select just the rows with the count equal to 1.
Seems not elegant, but it is orders of magnitude faster than all the above solutions.
IMPORTANT NOTE: enable the INDEX on the columns to be checked.
SELECT name, source, id
FROM
(
SELECT name, "active_ingredients" as source, active_ingredients.id as id
FROM active_ingredients
UNION ALL
SELECT active_ingredients.name as name, "UNII_database" as source, temp_active_ingredients_aliases.id as id
FROM active_ingredients
INNER JOIN temp_active_ingredients_aliases ON temp_active_ingredients_aliases.alias_name = active_ingredients.name
) tbl
GROUP BY name
HAVING count(*) = 1
ORDER BY name
See query:
SELECT * FROM Table1 WHERE
id NOT IN (SELECT
e.id
FROM
Table1 e
INNER JOIN
Table2 s ON e.id = s.id);
Conceptually would be: Fetching the matching records in subquery and then in main query fetching the records which are not in subquery.
First define alias of table like t1 and t2.
After that get record of second table.
After that match that record using where condition:
SELECT name FROM table2 as t2
WHERE NOT EXISTS (SELECT * FROM table1 as t1 WHERE t1.name = t2.name)
I'm going to repost (since I'm not cool enough yet to comment) in the correct answer....in case anyone else thought it needed better explaining.
SELECT temp_table_1.name
FROM original_table_1 temp_table_1
LEFT JOIN original_table_2 temp_table_2 ON temp_table_2.name = temp_table_1.name
WHERE temp_table_2.name IS NULL
And I've seen syntax in FROM needing commas between table names in mySQL but in sqlLite it seemed to prefer the space.
The bottom line is when you use bad variable names it leaves questions. My variables should make more sense. And someone should explain why we need a comma or no comma.
I tried all solutions above but they did not work in my case. The following query worked for me.
SELECT NAME
FROM table_1
WHERE NAME NOT IN
(SELECT a.NAME
FROM table_1 AS a
LEFT JOIN table_2 AS b
ON a.NAME = b.NAME
WHERE any further condition);

SQL 0 results for 'Not In' and 'In' when row does exist

I have a table (A) with a list of order numbers. It contains a single row.
Once this order has been processed it should be deleted. However, it is failing to be deleted.
I began investigating, a really simple query is performed for the deletion.
delete from table(A) where orderno not in (select distinct orderno from tableB)
The order number absolutely does not exist in tableB.
I changed the query in SSMS to :
select * from table(A) where orderno not in (select distinct orderno from tableB)
This returned 0 rows. Bare in mind the orderno does exist in tableA.
I then changed the query from "not in" to "In". It still returned 0 rows. How can this be possible that a value is not in a list of values but also not show for the opposite?
Things I have tried:
Two additional developers to look over it.
ltrim(rtrim()) on both the select values.
Various char casts and casting the number as an int.
Has anyone experienced this?
Don't use NOT IN with a subquery. Use NOT EXISTS instead:
delete from tableA
where not exists (select 1 from tableB where tableA.orderno = tableB.orderno);
What is the difference? If any orderno in TableB is NULL, then NOT IN returns NULL. This is correct behavior based on how NULL is defined in SQL, but it is counterintuitive. NOT EXISTS does what you want.
You can use not exists
select *
from table(A) a
where not exists (selet 1 from tableB where orderno = a.orderno);
I have experienced the same.
try joining the two tables tableA and TableB
select * from TableA a
inner join TableB b on a.orderno =b.orderno
This should allow you to get the records and then you can delete the same.

Replace null with ID from right table in join

I am doing a select with right join. And I have some rows with IDS from table A but not all rows have corresponding IDs in Table B.
I wish to replace null IDS from table B with IDS of table A based on other row.
That alternative row has no null values. How can it be done?
SQL has a command for that...
ISNULL(TableB.Id, TableA.Id) AS SomeId
In the event of TableB.Id being NULL, you'll get TableA.Id as SomeId.
THIS ADDRESSES THE ORIGINAL VERSION OF THE QUESTION.
First, I strongly recommend left join over right join. It is easier to read a query when the logic is "keep all the rows in the first table (and I know what that is)".
You can use coalesce(). In a select query, you would do:
select coalesce(b.id, a.id)
from a left join
b
on a.anothercol = b.anothercol;
However, I suspect you intend:
update b
set id = (select a.id from a where a.anothercol = b.anothercol)
where id is null;

Get rid off from matching record and add not equal data

I have following tables:
Table a:
Name
T1
T2
T3
T4
Table b:
Name
T1
T2
T3
T4
T5
T6
I need to select all from table a and add what is not in table a from table b, result below:
T1
T2
T3
T4
T5
T6
Thanks for help
If you want all unique names from both the tables, use UNION:
select name from table_a
union
select name from table_b;
Here is another way:
select ta.name from ta
union all
select tb.name from tb
left join ta
on tb.name = ta.name
where ta.name is null
I would do this with an anti-join (a NOT IN condition). As written below, it will not work correctly if NULL is possible in that column in table a (in that case, the anti-join should be written with a NOT EXISTS condition). I assume the column is NOT NULL.
An anti-join is faster than a join, because as soon as a value from table b is also found in a, the joining for that row of table b stops and processing moves on to the next row. In a join, the joining continues, there is no such short-circuiting.
Oto's solution uses a join rather than an anti-join. However, I believe the Oracle query optimizer recognizes, in this simple case, that an anti-join is sufficient, and it will rewrite the query to use an anti-join. This is something you can verify by running Explain Plan on both queries. With that said, in a similar but much more complicated problem, the optimizer may not be able to "see" this shortcut; this is why I believe it's best to write anti-joins (and semi-joins, where we use IN or EXISTS conditions) explicitly, rather than rely on the optimizer.
The query should be
select name from a
union all
select name from b where name not in ( select name from a );
Here's one way to do that:
Select distinct Name
from (
select Name from Table A
UNION ALL
select Name from Table B
)

SQL Select queries

Which is better and what is the difference?
SELECT * FROM TABLE_A A WHERE A.ID IN (SELECT B.ID FROM TABLE_B B)
or
SELECT * FROM TABLE_A A, TABLE_B B WHERE A.ID = B.ID
The "best" way is to use the standard ANSI JOIN syntax:
SELECT (columns)
FROM TABLE_A a
INNER JOIN TABLE_B b
ON b.ID = a.ID
The first WHERE IN version will often result in the same execution plan, but on certain platforms it can be slower - it's not always consistent. The IN query (which is equivalent to EXISTS) is also going to become progressively more cumbersome to write and maintain as you start to add more tables or create more complex join conditions - it's not as flexible as an actual JOIN.
The second, comma-separated syntax is not as consistently supported as JOIN. It does work on most SQL DBMSes, but it's not the "preferred" version because if you leave out the WHERE clause then you end up with a cross-product. Whereas if you forget to write in the JOIN condition, you'll just end up with a syntax error. JOIN tends to be preferred because of this safety net.
I upvoted #Aaronaught's answer, but I have some comments:
Both the comma-style join syntax and the JOIN syntax are ANSI. The first is SQL-89, and the second is SQL-92. The SQL-89 syntax is still part of the standard, to support backward compatibility.
Can you give an example of an RDBMS that supports the SQL-92 syntax but not the SQL-89? I don't think there are any, so "not as consistently supported" may not be accurate.
You can also omit the join condition using JOIN syntax, and create a Cartesian product. Example: SELECT ... FROM A JOIN B is valid (correction: this is true only in some brands that implement the standard syntax loosely, such as MySQL).
But in any case I agree this is easier to spot when you use SQL-92 syntax. If you use SQL-89 syntax you may end up with a long WHERE clause and it's too easy to miss one of your join conditions.
The difference is that the first does a subquery which can be slower in some databases. And the second does a join, combining both tables in the same query.
Generally, the second would be faster if the database won't optimize it since with a subquery the database would have to keep the results of the subquery in memory.
These two queries return different results. You select only columns from TABLE_A in the first.
There are at least three differences between query X:
SELECT * FROM TABLE_A A WHERE A.ID IN (SELECT B.ID FROM TABLE_B B)
and Y:
SELECT * FROM TABLE_A A, TABLE_B B WHERE A.ID = B.ID
1) As Michas said, the set of columns will be different, where query Y will return the columns from tables A & B, but query X only returns the columns from table A. If you explicitly name which columns you want back, query X can only include columns from table A, but query Y would include columns from table B.
2) The number of rows may be different. If table B has more than on ID matching an ID from table A, then more rows will be returned with Query Y than X.
create table TABLE_A (ID int, st VARCHAR(10))
create table TABLE_B (ID int, st VARCHAR(10))
insert into TABLE_A values (1, 'A-a')
insert into TABLE_B values (1, 'B-a')
insert into TABLE_B values (1, 'B-b')
SELECT * FROM TABLE_A A WHERE A.ID IN (SELECT B.ID FROM TABLE_B B)
ID st
----------- ----------
1 A-a
(1 row(s) affected)
SELECT * FROM TABLE_A A, TABLE_B B WHERE A.ID = B.ID
ID st ID st
----------- ---------- ----------- ----------
1 A-a 1 B-a
1 A-a 1 B-b
(2 row(s) affected)
3) The execution plans will probably be different, since the queries are asking the database for different results. Inner joins used to run faster than in or exists and may still run faster in some cases. But since the results can be different you need to make sure that the data supports the transformation from a in or exists to a join.