SQL - concatenated string in where clause not working as expected - Redshift - sql

This is the SQL (Redshift) I am referring to:
SELECT *
FROM table_a
WHERE col_a||col_b NOT IN
(
SELECT col_a||col_b
from table_b
);
There are values in table_a which don't exist in table_b, yet this always evaluates to a No rows.
Any insight?

Use NOT EXISTS, not NOT IN!!! NOT IN will return no rows at all if any value in the subquery is NULL. Plus, you are comparing multiple columns by concatenating them. That is likely to be a bad method because of collisions: ab/c matches a/bc.
So use this:
SELECT a.*
FROM table_a a
WHERE NOT EXISTS (SELECT 1
FROM table_b b
WHERE b.col_a = a.col_a AND b.col_b = a.col_b
);

The sub query might be having a null value, which is failing the not in clause.
Use coalesce to replace null with an appropriate value based on your data. I have used a ~ character in place of null
SELECT *
FROM table_a
WHERE col_a||col_b NOT IN
(
SELECT coalesce (col_a,'~')||coalesce (col_b , '~')
from table_b
);

Related

Postgresql select from based on condition

How to run a given select statement based on condition?
If a condition (which comes from table_A) is true then select from table_B otherwise from table_C. Tables have no common column.
Something like this
select case when table_A.flag=true then
(select * from table_B )
else
(select * from table_C )
end
from table_A where ...
The above one will fail of course : more than one row returned by a subquery used as an expression
Since the columns are the same, you could use a UNION. Something like:
SELECT *
FROM Table_B
WHERE (SELECT flag FROM Table_A) = true
UNION ALL
SELECT *
FROM Table_C
WHERE (SELECT flag FROM Table_A) <> true
I'm assuming here that Table_A has only one row, but you could adjust the subquery in the WHERE conditions to get the flag however you need it.
The basic idea is that you set up the two conditions so that only one of them is true at a time (based on your flag). So, even though it is a UNION, only one part of the query will return results and you either end up with Table_B or Table_C.

Snowflake, SQL where clause

I need to write query with where clause:
where
pl.ods_site_id in (select id from table1 where ...)
But if subquery (table1) didn't return anything, where clause doesn't need to include in result query (like it returns TRUE).
How can I do it? (I have snowflake SQL dialect)
You could include a second condition:
where pl.ods_site_id in (select id from table1 where ...) or
not exists (select id from table1 where ...)
This explicitly checks for the subquery returning no rows.
If you are willing to use a join instead, Snowflake supports qualify clause which might come in handy here. You can run this on Snowflake to see how it works.
with
pl (ods_site_id) as (select 1 union all select 5),
table1 (id) as (select 5) --change this to 7 to test if it returns ALL on no match
select a.*
from pl a
left join table1 b on a.ods_site_id = b.id -- and other conditions you want to add
qualify b.id = a.ods_site_id --either match the join condition
or count(b.id) over () = 0; --or make sure there is 0 match from table1

Find deleted rows: Not EXISTS vs Not IN

In my case, I have two table with same structure: TableA & TableB, and what I was trying to do is to find if there is any records only exists in A but not B.
My script was
SELECT * FROM TableA
WHERE NOT EXISTS (
SELECT * FROM TableB
)
While there is 2 records which only exists in A but not B, this script returns nothing. Then I changed into following:
SELECT ID FROM TableA
WHERE ID NOT IN (
SELECT ID FROM TableB
)
This script works successfully and return the 2 records' ID.
My question is: Is this behavior normal? What is the mechanism behind NOT EXISTS and NOT IN?
I have read some other posts comparing NOT EXISTS and NOT IN, and most people suggest using NOT EXISTS in 99.9% scenarios, is this case fall into that 0.1% which NOT EXISTS is not applicable? (I believed it's due to my wrongly usage though, please correct me if that's the case)
If you want to look at all the values in the rows, then use EXCEPT:
SELECT *
FROM TableA
EXCEPT
SELECT *
FROM TableB;
If you want to use NOT EXISTS correctly, then you need a correlation clause:
SELECT a.*
FROM TableA a
WHERE NOT EXISTS (SELECT 1 FROM TableB b WHERE b.id = a.id);
I strongly recommend using NOT EXISTS over NOT IN with a subquery. NOT IN will return no rows at all if b.id is ever NULL. That is usually not what is intended. NOT EXISTS matches the expected semantics.
You need to be careful with the NOT IN expression.
The A NOT IN(B,C,D) expression basically means (A<>B AND A<>C AND A<>D). If any of the values are NULL the whole expression will become NULL.
So, applicable to your example the correct NOT IN expression should be (unless the ID is not nullable column):
SELECT ID FROM TableA
WHERE ID NOT IN (
SELECT ID FROM TableB WHERE ID IS NOT NULL
)

Wrong SELECT subquery in 'IN' condition

I made a query like this
SELECT *
FROM TABLE_A
WHERE 1=1
AND ID_NO IN (
SELECT ID_NO
FROM TABLE_B
WHERE SEQ = '1'
)
The problem was there is no column 'ID_NO' in TABLE_B.So I was expecting that query doesn't work.
But This query worked. I don't understand why.
Why didn't it cause error?
The query is correct if table_B does not have a column named ID_NO, but table_A has. Then you would have a correlated subquery, where the subquery select ID_NO refers to the outer ID_NO-attribute of table_A (persumably makes no sense, but is correct for the compiler).
Consider the following schema:
create table table_a (
id_no int
);
create table table_b (
other_id_no int
);
insert into table_a values (1),(2);
insert into table_b values (1),(3);
Then, the following query will compile; but it will always yield an empty result, because it actually means something like where id_no not in (id_no):
select * from table_a where id_no not in (select id_no from table_b);
When dealing with subqueries, I'd suggest to use table aliases in order to avoid such unintended behaviour. For example, the following query does not compile, and the compiler gives you the hint what is wrong:
select * from table_a a where a.id_no not in (select b.id_no from table_b b);
Error: Unknown column 'b.id_no' in 'field list'
Correcting the error then leads to:
select * from table_a a where a.id_no not in (select b.other_id_no from table_b b);

Select rows using EXCEPT - SQL Server

How can I return just different columns when I use except in SQL Server?
Example:
SELECT ID, NAME FROM TABLE_B
EXCEPT
SELECT ID, NAME FROM TABLE_A
In this case, if there is different name return just show name column.
Your code is correct. You won't get any repeated row (that's ID + NAME!).
But if I understand correctly, you only want to focus in names. Then remove ID from selected fields:
SELECT NAME FROM TABLE_B
EXCEPT
SELECT NAME FROM TABLE_A
[Edited, regarding a comment:]
This shows distinct rows from TABLE_B that aren’t in TABLE_A. This is the goal of using EXCEPT. For anything else, EXCEPT is not the solution.
In case you're looking for all diferent names from both tables, you can use:
select distinct NAME
from
(select NAME from TABLE_A
UNION
select NAME from TABLE_B) as T
You can get a result set which flags non-existing data in the second table in the form
ID flag_ID NAME flag_Name
1 ! A ! -- No Id=1, no NAME ='A' exists in the second table
3 NULL NULL ! -- Id=3 exists, no NAME is NULL exists
4 NULL Y NULL -- Both values exist but never in the same row
and proceed with a criteria you need.
Assuming ID is NOT NULL, NAME is nullable, NULLs should be considered "equal":
SELECT b.ID,
CASE WHEN NOT EXISTS (SELECT 1 FROM a t2 WHERE t2.ID=b.ID) THEN '!' END flag_ID,
b.NAME,
CASE WHEN NOT EXISTS (SELECT 1 FROM a t2
WHERE ISNULL(NULLIF(b.NAME, t2.NAME), NULLIF(t2.NAME, b.NAME)) IS NULL)
THEN '!' END flag_Name
FROM b
LEFT JOIN a ON a.ID = b.ID
AND ISNULL(NULLIF(a.NAME, b.NAME), NULLIF(b.NAME, a.NAME)) IS NULL
WHERE a.ID IS NULL
OR ISNULL(NULLIF(a.NAME, b.NAME), NULLIF(b.NAME, a.NAME)) IS NOT NULL