Wrong SELECT subquery in 'IN' condition - sql

I made a query like this
SELECT *
FROM TABLE_A
WHERE 1=1
AND ID_NO IN (
SELECT ID_NO
FROM TABLE_B
WHERE SEQ = '1'
)
The problem was there is no column 'ID_NO' in TABLE_B.So I was expecting that query doesn't work.
But This query worked. I don't understand why.
Why didn't it cause error?

The query is correct if table_B does not have a column named ID_NO, but table_A has. Then you would have a correlated subquery, where the subquery select ID_NO refers to the outer ID_NO-attribute of table_A (persumably makes no sense, but is correct for the compiler).
Consider the following schema:
create table table_a (
id_no int
);
create table table_b (
other_id_no int
);
insert into table_a values (1),(2);
insert into table_b values (1),(3);
Then, the following query will compile; but it will always yield an empty result, because it actually means something like where id_no not in (id_no):
select * from table_a where id_no not in (select id_no from table_b);
When dealing with subqueries, I'd suggest to use table aliases in order to avoid such unintended behaviour. For example, the following query does not compile, and the compiler gives you the hint what is wrong:
select * from table_a a where a.id_no not in (select b.id_no from table_b b);
Error: Unknown column 'b.id_no' in 'field list'
Correcting the error then leads to:
select * from table_a a where a.id_no not in (select b.other_id_no from table_b b);

Related

How to create a select clause using a subquery

I have the following sql statement:
WITH
subquery AS (
select distinct id from a_table where some_field in (1,2,)
)
select id from another_table where id in subquery;
Edit
JOIN is not an option (this is just a reduced example of a bigger query)
But that obviously does not work. The id field exists in both tables (with a different name, but values are the same: numeric ids). Basically what I want to do is filter by the result of the subquery, like a kind of intersection.
Any idea how to write that query in a correct way?
You need a subquery for the second operand of IN that SELECTs from the CTE.
... IN (SELECT id FROM subquery) ...
But I would recommend to rewrite it as a JOIN.
Are you able to join on ID and then filter on the Where clause?
select a.id
from a.table
inner join b.table on a.id = b.id
where b.column in (1,2)
Since you only want the id from another_table you can use exists
with s as (
select id
from a_table
where some_field in (1,2)
)
select id
from another_table t
where exists ( select * from s where s.id=t.id )
But the CTE is really redundant since all you are doing is
select id
from another_table t
where exists (
select * from a_table a where a.id=t.id and a.some_field in (1,2)
)

Why select invalid field in subquery could run in BigQuery?

For the following sql
CREATE or replace TABLE
temp.t1 ( a STRING)
;
insert into temp.t1 values ('val_a');
CREATE or replace TABLE
temp.t2 (b STRING)
;
insert into temp.t2 values ('val_b');
create or replace table `temp.a1` as
select distinct b
from temp.t2
;
select distinct a
from `temp.t1`
where a in (select distinct a from `temp.a1`)
;
Since there is no a in temp.a1 and there should be an error here, However, the output of Bigquery is
Row a
1 val_a
Why the result happened?
On the other side, when run select distinct a from temp.a1; there is one error Unrecognized name: a comes up.
Your query is:
select distinct a
from `temp.t1`
where a in (select distinct a from `temp.a1`);
You think this should be:
select distinct t1.a
from `temp.t1` t1
where t1.a in (select distinct a1.a from `temp.a1` a1);
And hence generate an error. However, the rules of SQL interpret this as:
select distinct t1.a
from `temp.t1` t1
where t1.a in (select distinct t1.a from `temp.a1` a1);
Because the scoping rules say that if a is not found in the subquery then look for it in the outer query.
That is the definition of SQL.
The solution? Always qualify column references. Qualify means to include the table alias in the reference.
Also note that select distinct is meaningless in the subquery for an in, because in does not create duplicates. You should get rid of the distinct in the subquery.

SQL - concatenated string in where clause not working as expected - Redshift

This is the SQL (Redshift) I am referring to:
SELECT *
FROM table_a
WHERE col_a||col_b NOT IN
(
SELECT col_a||col_b
from table_b
);
There are values in table_a which don't exist in table_b, yet this always evaluates to a No rows.
Any insight?
Use NOT EXISTS, not NOT IN!!! NOT IN will return no rows at all if any value in the subquery is NULL. Plus, you are comparing multiple columns by concatenating them. That is likely to be a bad method because of collisions: ab/c matches a/bc.
So use this:
SELECT a.*
FROM table_a a
WHERE NOT EXISTS (SELECT 1
FROM table_b b
WHERE b.col_a = a.col_a AND b.col_b = a.col_b
);
The sub query might be having a null value, which is failing the not in clause.
Use coalesce to replace null with an appropriate value based on your data. I have used a ~ character in place of null
SELECT *
FROM table_a
WHERE col_a||col_b NOT IN
(
SELECT coalesce (col_a,'~')||coalesce (col_b , '~')
from table_b
);

select * from table where column in (sub query) where sub query return ORA-00904 [duplicate]

This question already has an answer here:
Oracle subquery funniness
(1 answer)
Closed 3 years ago.
The query:
SELECT COLUMN_NAME FROM MY_TABLE
Return
ORA-00904 Invalid identifier because there is no COLUMN_NAME column in MY_TABLE, so far so good.
The query:
SELECT *
FROM OTHER_TABLE
WHERE COLUMN_NAME IN (SELECT COLUMN_NAME FROM MY_TABLE)
Not only does it not fail, it returns the complete OTHER_TABLE. It happens only when the inside query select for column that is in the “outside” table.
If I run the same query, and just change the inside query select column to different column that also does not exists in the table but does not exists in the table in the outside table as well.
SELECT *
FROM OTHER_TABLE
WHERE COLUMN_NAME IN (SELECT DIFFERENT_NAME FROM MY_TABLE)
DIFFERENT_NAME column does not exists in OTHER_TABLE
It does fail on ORA-00904 Invalid identifier.
1. How come the query that use a column that exists in the outside query but does not exists in the inside query does not fail?
2. How come it returns the complete table?
Imagine that we have two tables: TA with field A and TB with field B, now let's write some queries:
select A -- wrong: TB doesn't have A field
from TB
But this one will be OK and return the entire TB table providing that B field is not null and TA is not empty:
select *
from TB
where B in (select B -- <- B is from TB in both cases
from TA)
In this case you have
where B in (select B from TA)
be equal to
-- 1. null in (...) is null, not true
-- 2. we have not empty TA
where (B is not null) and Exists (select 1 from TA)
And, finally
select *
from TB
where B in (select C -- wrong: there's no field C in TB as well as in TA
from TA)
You can use columns from the "outer" table in the query in the in clause. For each row from the outer table, the value of that row's column_name is selected form the inner table (similar to if you were selecting a literal value). Since it's just the same column_name value of the row from the outer query, they are obviously equal, so the condition is fulfilled and the row is returned.
A good defensive practice to avoid such mistakes is to fully qualify the columns you're querying (preferably using table aliases), so the query would error out instead of returning something you don't expect:
SELECT *
FROM other_table ot
WHERE ot.column_name IN (SELECT mt.column_name -- causes error!
FROM my_table mt)

SQL: how to find unused primary key

I've got a table with > 1'000'000 entries; this table is referenced from about 130 other tables. My problem is that a lot of those 1-mio-entries is old and unused.
What's the fastet way to find the entries not referenced by any of the other tables? I don't like to do a
select * from (
select * from table-a TA
minus
select * from table-a TA where TA.id in (
select "ID" from (
(select distinct FK-ID "ID" from table-b)
union all
(select distinct FK-ID "ID" from table-c)
...
Is there an easier, more general way?
Thank you all!
You could do this:
select * from table_a a
where not exists (select * from table_b where fk_id = a.id)
and not exists (select * from table_c where fk_id = a.id)
and not exists (select * from table_d where fk_id = a.id)
...
try :
select a.*
from table_a a
left join table_b b on a.id=b.fk_id
left join table_c c on a.id=c.fk_id
left join table_d d on a.id=d.fk_id
left join table_e e on a.id=e.fk_id
......
where b.fk_id is null
and c.fk_id is null
and d.fk_id is null
and e.fk_id is null
.....
you might also try:
select a.*
from table_a a
left join
(select b.fk_id from table_b b union
select c.fk_id from table_c c union
...) table_union on a.id=table_union.fk_id
where table_union.fk_id is null
This is more SQL oriented and it will not take forever like the above solution.
Not sure about efficiency but:
select * from table_a
where id not in (
select id from table_b
union
select id from table_c )
If your concern is allowing the database to continue normal operations while you do the house keeping you could split it into multiple stages:
insert into tblIds
select id from table_a
union
select id from table_b
as may times as you need and then:
delete * from table_a where id not in ( select id from tableIds )
Of course sometimes doing a lot of processing takes a lot of time.
I like #Patrick's answer above, but I would like to add to that.
Rather than building the 130-step query by hand, you could build these INSERT statements by scanning sysObjects, finding key relations and generating your INSERT statements.
That would not only save you time, but should also help you to know for sure whether you've covered all the tables - maybe there are 131, or only 129.
I'm inclined to Marcelo Cantos' answer (and have upvoted it), but here is an alternative in an attempt to circumvent the problem of not having indexes on the foreign keys...
WITH
ids_a AS
(
SELECT id FROM myTable
)
,
ids_b AS
(
SELECT id FROM ids_a WHERE NOT EXISTS (SELECT * FROM table_a WHERE fk_id = ids_a.id)
)
,
ids_c AS
(
SELECT id FROM ids_b WHERE NOT EXISTS (SELECT * FROM table_b WHERE fk_id = ids_b.id)
)
,
...
,
ids_z AS
(
SELECT id FROM ids_y WHERE NOT EXISTS (SELECT * FROM table_y WHERE fk_id = ids_y.id)
)
SELECT * FROM ids_z
All I'm trying to do is to suggest an order to Oracle to minimise its efforts. Unfortunately Oracle will compile this to comething very similar to Marcelo Cantos' answer and it may not performa any differently.