SQL: mix join with case and NULL - sql

Pretty simple, what's the best way to fix the NULL filtering below, since the = operand doesn't work with NULL?
The Key2, when Data1=-1, is trying to do a Key2=NULL which is not selecting NULL values.
LEFT JOIN myReferenceTable
on myReferenceTable.Key1 = myDataTable.FKey1
and myReferenceTable.Key2 =
CASE
WHEN myDataTable.Data1 = -1 THEN NULL
ELSE myDataTable.Data3 - myDataTable.Data4
END
Thanks!

Swap out your case statement using ANDs / ORs
LEFT JOIN myReferenceTable
on myReferenceTable.Key1 = myDataTable.FKey1
and ((myDataTable.Data1 = -1 AND myReferenceTable.Key2 IS NULL)
OR (myDataTable.Data1 != -1 AND myReferenceTable.Key2 = myDataTable.Data3 - myDataTable.Data4))

If you're using MySQL, you need to use the "null-safe equal" operator.
From the manual:
<=>
NULL-safe equal. This operator performs an equality comparison like
the = operator, but returns 1 rather than NULL if both operands are
NULL, and 0 rather than NULL if one operand is NULL.
mysql> SELECT 1 <=> 1, NULL <=> NULL, 1 <=> NULL;
-> 1, 1, 0
mysql> SELECT 1 = 1, NULL = NULL, 1 = NULL;
-> 1, NULL, NULL

My understanding is that you are trying to join the ReferenceTable to the DataTable on Key1 & Key2, (Key1 is a direct match, and Key2 matches to the difference between Data3 and Data4) but you want to explicitly filter out rows where Data1 = -1. It seems that you may be wanting to also return a NULL if Data1 = -1, but what gets returned will depend on what's in your SELECT statement. We're talking about the Join here, and you're doing a Left Join which tells me you want all rows from myRerenceTable returned, whether of not there are matches in myDataTable. Having said that, I would write the Join as:
LEFT JOIN myReferenceTable
on myReferenceTable.Key1 = myDataTable.FKey1
and myDataTable.Data1 <> -1
AND myReferenceTable.Key2 = myDataTable.Data3 - myDataTable.Data4

Related

PostgreSQL - Update with JOIN - inner or outer?

I come from SQL Server and I am migrating some T-SQL code to Postgres.
In PostgreSQL, I now have this UPDATE statement (see below).
In there:
"#reportdata" is a temporary table
kwt.Report is a normal table
This part in the WHERE clause is doing an implicit JOIN. I think that's how they call it in Postgres.
(cr.campaignid = rp.campaignid AND cr.reportdate = rp.reportdate)
That is because this couple (campaignid, reportdate) represents a unique logical key in kwt.Report. Also, both columns are not nullable in kwt.Report.
In "#reportdata" both columns can be NULL.
My question is: when I see such an implicit join in an UPDATE statement, I am somehow always not quite sure if it's INNER or OUTER join. I think it's INNER, there's no way this to be OUTER but I just want to be sure.
Could someone please confirm?
I mean, OK, if rp.campaignid is NULL there's no way this condition to evaluate to true, right?
(cr.campaignid = rp.campaignid AND cr.reportdate = rp.reportdate)
I am asking this, because I am not sure if comparison with NULL works the same way in Postgres as in SQL Server. As far as I recall, in SQL Server NULL = a always evaluates to NULL (not to true (bit 0), not to false (bit 1) but to NULL). Please correct me if this understanding is not right. Is this the same in Postgres?
UPDATE kwt.Report cr
SET
impressions = rp.impressions,
clicks = rp.clicks,
views = rp.views
FROM
"#reportdata" AS rp
WHERE
(cr.campaignid = rp.campaignid AND cr.reportdate = rp.reportdate)
AND (rp.campaignid IS NOT NULL);
In SQL:
A = null is neither true nor false
Check this
with cte0 as
(
select '1' as c
), cte1 as
(
select null as c
)
select * from cte0
inner join cte1 on cte0.c = cte1.c
union
select * from cte0
inner join cte1 on cte0.c != cte1.c
c | c
:- | :-
db<>fiddle here

Finding rows where two values not equal, with nulls

I'm interested to know what the common practices are for this situation.
You need to find all rows where two columns do not match, both columns are nullable (Exclude where both columns are NULL). None of these methods will work:
WHERE A <> B --does not include any NULLs
WHERE NOT (A = B) --does not include any NULLs
WHERE (A <> B OR A IS NULL OR B IS NULL) --includes NULL, NULL
Except this...it does work, but I don't know if there is a performance hit...
WHERE COALESCE(A, '') <> COALESCE(B, '')
Lately I've started using this logic...it's clean, simple and works...would this be considered the common way to handle it?:
WHERE IIF(A = B, 1, 0) = 0
--OR
WHERE CASE WHEN A = B THEN 1 ELSE 0 END = 0
This is a bit painful, but I would advise direct boolean logic:
where (A <> B) or (A is null and B is not null) or (A is not null and B is null)
or:
where not (A = B or A is null and B is null)
It would be much simpler if SQL Server implemented is distinct from, the ANSI standard, NULL-safe operator.
If you use coalesce(), a typical method is:
where coalesce(A, '') <> coalesce(B, '')
This is used because '' will convert to most types.
How about using except ?
for example if i want to get all a and b that is not a=b and exclude all null values of a and b
select a, b from tableX where a is not null and b is not null
except
select a,b from tableX where a=b

Unknown processing time of SQL statement

I have a query like this
SELECT
a.LeaseNo, RIGHT(a.LeaseNo, 1) AS idx
FROM
Leases a
LEFT OUTER JOIN
DebitNoteItems b ON a.leaseno = b.LeaseNo
WHERE
status = 'A'
AND PortfolioType = 'R'
AND b.NoteItemID IS NULL
Result set is found in less than 1 second.
Then I tried to do find the records with idx = 0, thus I wrote
SELECT LeaseNo
FROM
(SELECT
a.LeaseNo, RIGHT(a.LeaseNo, 1) AS idx
FROM
Leases a
LEFT OUTER JOIN
DebitNoteItems b ON a.leaseno = b.LeaseNo
WHERE
status = 'A'
AND PortfolioType = 'R'
AND b.NoteItemID IS NULL) tmp
WHERE
tmp.idx = '0'
However, this query is very slow.
Then I tried
SELECT LeaseNo
FROM
(SELECT
a.LeaseNo, RIGHT(a.LeaseNo, 1) AS idx
FROM
Leases a
LEFT OUTER JOIN
DebitNoteItems b ON a.leaseno = b.LeaseNo
WHERE
status = 'A'
AND PortfolioType = 'R'
AND b.NoteItemID IS NULL) tmp
WHERE
tmp.idx LIKE '%0%'
This one is executed in less than 1 second.
I want to know why could the second one much faster than the first one when there are only 1 simple condition different (= '0' and LIKE '%0%')? What am I talking about is not few seconds difference, by querying it and the differences are less than 1 second for the second query (applied LIKE), and more than a minute (applied =, in fact it is still querying, it is terminated manually, it doesn't look like it is comparing the idx in the queried tmp table)
Is there something wrong or inappropriate in the query?

2 queries that should return the same data but don't

I have 2 Informix queries that I believe should return the same data but do not. The first query uses a subquery as a filter and improperly returns no rows. The second is performed using a left outer join checking for null on the same column used in the subquery and it properly returns the correct data set. Am I missing something or is this a bug?
select i.invoice_date, oe.commit_no
from oe
join invoice i
on oe.invoice_no = i.invoice_no
where i.invoice_date > today - 60
and oe.commit_no not in (select commit_no from bolx)
select i.invoice_date, oe.commit_no, bolx.bol_no
from oe
join invoice i
on oe.invoice_no = i.invoice_no
left join bolx
on bolx.commit_no = oe.commit_no
where i.invoice_date > today - 60
and bolx.commit_no is null
Abbreviated schemas (this is a legacy db, so it's got some quirks):
invoice
invoice_no char(9),
invoice_date date
oe
commit_no decimal(8, 0),
invoice_no char(9)
bolx
commit_no decimal(8, 0)
Any time I read "Not In ... subquery ... returns no rows" I'm pretty sure I know the answer!
I presume select commit_no from bolx returns some NULL values?
The presence of a NULL in a NOT IN guarantees that no results will be returned.
foo NOT IN (bar, NULL) is equivalent to
foo <> bar and foo <> NULL
The foo <> NULL part will always evaluate to unknown and the AND can never evaluate to true unless all conditions evaluate to true.

SQL Server: Logical equivalent of ALL query

I have a following query (simplified):
SELECT
Id
FROM
dbo.Entity
WHERE
1 = ALL (
SELECT
CASE
WHEN {Condition} THEN 1
ELSE 0
END
FROM
dbo.Related
INNER JOIN dbo.Entity AS TargetEntity ON
TargetEntity.Id = Related.TargetId
WHERE
Related.SourceId = Entity.Id
)
where {Condition} is a complex dynamic condition on TargetEntity.
In simple terms, this query should return entities for which all related entities match the required condition.
Unfortunately, that does not work quite well, since by SQL standard 1 = ALL evaluates to TRUE when ALL is applied to an empty set. I know I can add AND EXISTS, but that will require me to repeat the whole subquery, which, I am certain, will cause problems for performance.
How should I rewrite the query to achieve the result I need (SQL Server 2008)?
Thanks in advance.
Note: practically speaking, the whole query is highly dynamic, so the perfect solution would be to rewrite only 1 = ALL ( ... ), since changing top-level select can cause problems when additional conditions are added to top-level where.
Couldn't you use a min to achieve this?
EG:
SELECT
Id
FROM
dbo.Entity
WHERE
1 = (
SELECT
MIN(CASE
WHEN {Condition} THEN 1
ELSE 0
END)
FROM
dbo.Related
INNER JOIN dbo.Entity AS TargetEntity ON
TargetEntity.Id = Related.TargetId
WHERE
Related.SourceId = Entity.Id
)
The min should return null if there's no clauses, 1 if they're all 1 and 0 if there's any 0's, and comparing to 1 should only be true for 1.
It can be translated to pick Entities where no related entities with unmatched condition exist.
This can be accomplished by:
SELECT
Id
FROM
dbo.Entity
WHERE
NOT EXISTS (
//as far as I have an element which do not match the condition, skip this entity
SELECT TOP 1 1
FROM
dbo.Related
INNER JOIN dbo.Entity AS TargetEntity ON
TargetEntity.Id = Related.TargetId
WHERE
Related.SourceId = Entity.Id AND
CASE
WHEN {Condition} THEN 1
ELSE 0
END = 0
)
EDIT: depending on condition, you can write something like:
WHERE Related.SourceId = Entity.Id AND NOT {Condition} if it doesn't change too much the complexity of the query.
Instead of using all, change your query to compare the result of the subquery directly:
select Id
from dbo.Entity
where 1 = (
select
case
when ... then 1
else 0
end
from ...
where ...
)
Probably this will work: WHERE NOT 0 = ANY(...)
If I read the query correctly, it can be simplified to something like:
SELECT e.Id
FROM dbo.Entity e
INNER JOIN dbo.Related r ON r.SourceId = e.Id
INNER JOIN dbo.Entity te ON te.Id = r.TargetId
WHERE <extra where stuff>
GROUP BY e.Id
HAVING SUM(CASE WHEN {Condition} THEN 1 ELSE 0 END) = COUNT(*)
This says the Condition must be true for all rows. It filters the "empty" set case away with the INNER JOINs.