I'm interested to know what the common practices are for this situation.
You need to find all rows where two columns do not match, both columns are nullable (Exclude where both columns are NULL). None of these methods will work:
WHERE A <> B --does not include any NULLs
WHERE NOT (A = B) --does not include any NULLs
WHERE (A <> B OR A IS NULL OR B IS NULL) --includes NULL, NULL
Except this...it does work, but I don't know if there is a performance hit...
WHERE COALESCE(A, '') <> COALESCE(B, '')
Lately I've started using this logic...it's clean, simple and works...would this be considered the common way to handle it?:
WHERE IIF(A = B, 1, 0) = 0
--OR
WHERE CASE WHEN A = B THEN 1 ELSE 0 END = 0
This is a bit painful, but I would advise direct boolean logic:
where (A <> B) or (A is null and B is not null) or (A is not null and B is null)
or:
where not (A = B or A is null and B is null)
It would be much simpler if SQL Server implemented is distinct from, the ANSI standard, NULL-safe operator.
If you use coalesce(), a typical method is:
where coalesce(A, '') <> coalesce(B, '')
This is used because '' will convert to most types.
How about using except ?
for example if i want to get all a and b that is not a=b and exclude all null values of a and b
select a, b from tableX where a is not null and b is not null
except
select a,b from tableX where a=b
Related
For example, we have next query:
select A.id, A.field > (case when A.cond1 then
(case when B.field is null then 0 else B.field end) else
(case when C.field is null then 0 else C.field end) end)
from A left join B on B.A_id = A.id left join C on C.A_id = A.id;
Is there way to simplify work with replacing null to 0?
Yes, with COALESCE:
COALESCE(column_that_might_be_null, 0)
It accepts multiple arguments and works left to right returning the first non null:
COALESCE(column_that_might_be_null, another_possible_null, third_maybe_null, 0)
Is the equivalent of
CASE
WHEN column_that_might_be_null IS NOT NULL THEN column_that_might_be_null
WHEN another_possible_null IS NOT NULL THEN another_possible_null
WHEN third_maybe_null IS NOT NULL THEN third_maybe_null
ELSE 0
END
And it is SQL standard so should work on all compliant databases
PG also supports ISNULL and IFNULL which work similarly but I don't usually recommend them over COALESCE because they don't exist in all databases/don't necessarily work equivalently and aren't as flexible because they only accept 2 arguments. For me this isn't enough to justify saving 2 characters.. (And if you forget about COALESCE, and you end up doing ISNULL(ISNULL(ISNULL(first, second), third), 0) the SQL is more messy)
I have the following SQL Server query:
SELECT TOP (100) PERCENT
dbo.cct_prod_plc_log_data.wc,
dbo.cct_prod_plc_log_data.loc,
dbo.cct_prod_plc_log_data.ord_no,
dbo.cct_prod_plc_log_data.ser_lot_no,
dbo.cct_prod_plc_log_data.line,
ISNULL(dbo.imlsmst_to_sfdtlfil.ItemNo, '') AS ItemNo,
ISNULL(dbo.imlsmst_to_sfdtlfil.BldSeqNo, '') AS BldSeqNo,
ISNULL(dbo.imlsmst_to_sfdtlfil.BldOrdNo, '') AS BldOrdNo,
ISNULL(dbo.imlsmst_to_sfdtlfil.StringItemNo, '') AS StringItemNo,
ISNULL(dbo.imlsmst_to_sfdtlfil.StringSerLotNo, '') AS StringSerLotNo,
MAX(dbo.cct_prod_plc_log_data.InsertDateTime) AS LatestDateTime,
MIN(ISNULL(dbo.cct_prod_plc_log_data.erp_transaction_id, 0)) AS MinimumErpID,
ISNULL(dbo.imlsmst_to_sfdtlfil.QtyOnHand, 0) AS QtyOnHand
FROM
dbo.cct_prod_plc_log_data
LEFT OUTER JOIN dbo.imlsmst_to_sfdtlfil
ON dbo.cct_prod_plc_log_data.ser_lot_no = dbo.imlsmst_to_sfdtlfil.SerLotNo
AND dbo.cct_prod_plc_log_data.ord_no = dbo.imlsmst_to_sfdtlfil.OrderNo
AND dbo.cct_prod_plc_log_data.line = dbo.imlsmst_to_sfdtlfil.Bin
WHERE
( dbo.cct_prod_plc_log_data.erp_transaction_id < 3 OR dbo.cct_prod_plc_log_data.erp_transaction_id IS NULL )
AND (dbo.cct_prod_plc_log_data.wc <> '')
AND (dbo.cct_prod_plc_log_data.loc <> '')
AND (dbo.cct_prod_plc_log_data.line <> '')
GROUP BY
dbo.cct_prod_plc_log_data.wc,
dbo.cct_prod_plc_log_data.loc,
dbo.cct_prod_plc_log_data.ord_no,
dbo.cct_prod_plc_log_data.ser_lot_no,
dbo.cct_prod_plc_log_data.line,
dbo.imlsmst_to_sfdtlfil.ItemNo,
dbo.imlsmst_to_sfdtlfil.BldSeqNo,
dbo.imlsmst_to_sfdtlfil.BldOrdNo,
dbo.imlsmst_to_sfdtlfil.StringItemNo,
dbo.imlsmst_to_sfdtlfil.StringSerLotNo,
dbo.imlsmst_to_sfdtlfil.QtyOnHand
ORDER BY dbo.cct_prod_plc_log_data.ord_no DESC
It contains a Left Outer Join between the two tables on 3 fields. Based on the current construction if any of the 3 Joined fields in the right table (dbo.imlsmst_to_sfdtlfil) are null or missing then the fields in the left query should return null.
How do I determine which of the 3 fields is the field that caused the join to fail? I would like to differentiate these from each other. Thanks.
(Ex. ser_lot_no and ord_no exists but bin is null vs bin and ord_no exist but ser_lot_no is null. )
Change it for an inner join and comment out all but one of the conditions, then uncomment them one at a time until the data disappears again - that's the faulty condition. If there was no data even with just one condition, that's the faulty condition:
SELECT
c.wc,
c.loc,
c.ord_no,
c.ser_lot_no,
c.line,
COALESCE(i.ItemNo, '') AS ItemNo,
COALESCE(i.BldSeqNo, '') AS BldSeqNo,
COALESCE(i.BldOrdNo, '') AS BldOrdNo,
COALESCE(i.StringItemNo, '') AS StringItemNo,
COALESCE(i.StringSerLotNo, '') AS StringSerLotNo,
MAX(c.InsertDateTime) AS LatestDateTime,
MIN(COALESCE(c.erp_transaction_id, 0)) AS MinimumErpID,
COALESCE(i.QtyOnHand, 0) AS QtyOnHand
FROM
dbo.cct_prod_plc_log_data c
INNER JOIN dbo.imlsmst_to_sfdtlfil i
ON
c.ser_lot_no = i.SerLotNo
--AND c.ord_no = i.OrderNo
--AND c.line = i.Bin
WHERE
( c.erp_transaction_id < 3 OR c.erp_transaction_id IS NULL )
AND (c.wc <> '')
AND (c.loc <> '')
AND (c.line <> '')
GROUP BY
c.wc,
c.loc,
c.ord_no,
c.ser_lot_no,
c.line,
COALESCE(i.ItemNo, ''),
COALESCE(i.BldSeqNo, '')
COALESCE(i.BldOrdNo, '')
COALESCE(i.StringItemNo, '')
COALESCE(i.StringSerLotNo, '')
COALESCE(i.QtyOnHand, 0)
ORDER BY c.ord_no DESC
Using INNER JOIN is more obvious than OUTER JOIN as most query tools give a row count and it's easier to see the row count changing from 99990 to 100000 than it is to eyeball 100000 rows looking for 10 that are null when they shouldn't be
If you have more than 2 tables, comment out your select block, put a *, and all but 2 tables:
SELECT *
/* columns,list,here,blah,blah */
FROM
table1
JOIN table2 ON ...
--JOIN table3 on ...
--JOIN table4 on ...
Run it, get the expected number of rows, then proceed uncommenting more and more tables. If at any point your row count changes unexpectedly (more when you expected less, or less when you expected more) investigate.
If the row count increases, it's probably a cartesian product and should be resolved by adding extra join conditions, not by whacking a DISTINCT in
Other top tips:
Use COALESCE rather than ISNULL; improve your database cross skilling
Alias tables and use the alias name, rather than repeating the schema and column name everywhere
GROUP BY the coalesced result rather than the column, if you're using a DB that draws a distinction between empty string and null string, otherwise you'll end up with two rows in your results when you expect 1
Edit: You said:
Thank you for the insight and tips. However, my problem was more so a question on how to incorporate the information on which field was causing the join to fail as a permanent addition rather than a one time audit. Any insight for that? –
And I say:
You can't feasibly do this, the database cannot tell you "which field" isn't working out because most of them that aren't working out. To see what I mean, run this:
SELECT
-- replace .id with the name of the pk column
CONCAT('Cannot join c[', c.id, '] to i[', i.id, '] because: ',
CASE
WHEN COALESCE(c.ser_lot_no, 'null') != COALESCE(i.SerLotNo, 'null ') THEN 'c.ser_lot_no != i.SerLotNo, '
END,
CASE
WHEN COALESCE(c.ord_no, 'null') != COALESCE(i.OrderNo, 'null ') THEN 'c.ord_no != i.OrderNo, '
END,
CASE
WHEN COALESCE(c.line, 'null') != COALESCE(i.Bin, 'null ') THEN 'c.line != i.Bin, '
END
)
FROM
dbo.cct_prod_plc_log_data c
CROSS JOIN dbo.imlsmst_to_sfdtlfil i
It asks the database to join every row to every other row and then look at the values on the row and work out whether it can be joined or not.. If Table c has 1000 rows and table i has 2000 rows (and each row in c matches at most 2 rows in i), you'll get a result set of 2 million rows, 1998000 of which are "can't join this row to that row because..."
A.id
1
2
3
B.id
3
4
5
The only row from A that joins with B is "3", and even then "3" from A doesn't join with 4 or 5 from B, and 3 from B doesn't join with 1 or 2 from A. For your single set of matched rows, you have 8 complaints that the rows don't match (3x3 rows total, minus one match)
So no, you can't feasibly ask a database to tell you which rows from this table didn't match which rows from that table because of condition X, because the answer is "nearly all of them didn't match" and "all" could be hundreds of millions
It gets marginally more feasible if you have some join columns that should work out all the time, and others that sometimes don't:
SELECT CASE WHEN a.something != b.other THEN 'this row would fail because something != other' END
FROM a JOIN b ON a.id = b.id --and a.something = b.other
But think about it for a second; relational databases are centered around the idea that data is related, and you can even enforce it with constraints: "don't allow row X to be inserted here unless it has an A and a B and a C value that is present in this other table's D and E and F columns"
That's what you should be using to ensure your joins work out (relational integrity), not allowing any old crap into the database and then trying to work out which rows might have joined to which other rows if only there wasn't some typo in column A that meant it didn't quite match up with D, even though B/C matched with E/F ..
Select BillName as A, ConsigneeName as B, ShipperName as C
from Sum_Orders
where (OrderStatus in ('Complete','Invoiced')
)
and
OrderPeriodYear IN (
(
YEAR(GETDATE())-1
)
)
Group by billname,ConsigneeName,ShipperName
I'm having duplicates in A, B, C (which is expected)
I'm trying to make a condition to
keep the value in A and set to nulls the values that repeat in B OR C
IF A = B or C then keep A and SET B or C to NULLS
Thank you, guys, :D
Is this what you want?
update t
set B = (case when B <> A then B end),
C = (case when C <> A then C end)
where B <> A or C <> A;
If you have to do this inline the perhaps a case will help.
Select Billname AS A,
CASE WHEN ConsigneeName = Billname THEN NULL ELSE ConsigneeName END,
CASE WHEN ShipperName = Billname THEN NULL ELSE ShipperName
from Sum_Orders etc ...
If the table is big, this maybe expensive on the query and pushing this logic into the query itself might be better.
If a,b and c has same value, b and c should set null, so:
Update tablename
Set B = if(A=B, null, B) , C=if(A=C, null, C)
-- where A=B or A=C
You can use 'where' if optimization is interesting!
If you're going to 'Select' value:
Select A, if(A=B, null, B) as B , if(A=C, null, C) as C from tablename
Pretty simple, what's the best way to fix the NULL filtering below, since the = operand doesn't work with NULL?
The Key2, when Data1=-1, is trying to do a Key2=NULL which is not selecting NULL values.
LEFT JOIN myReferenceTable
on myReferenceTable.Key1 = myDataTable.FKey1
and myReferenceTable.Key2 =
CASE
WHEN myDataTable.Data1 = -1 THEN NULL
ELSE myDataTable.Data3 - myDataTable.Data4
END
Thanks!
Swap out your case statement using ANDs / ORs
LEFT JOIN myReferenceTable
on myReferenceTable.Key1 = myDataTable.FKey1
and ((myDataTable.Data1 = -1 AND myReferenceTable.Key2 IS NULL)
OR (myDataTable.Data1 != -1 AND myReferenceTable.Key2 = myDataTable.Data3 - myDataTable.Data4))
If you're using MySQL, you need to use the "null-safe equal" operator.
From the manual:
<=>
NULL-safe equal. This operator performs an equality comparison like
the = operator, but returns 1 rather than NULL if both operands are
NULL, and 0 rather than NULL if one operand is NULL.
mysql> SELECT 1 <=> 1, NULL <=> NULL, 1 <=> NULL;
-> 1, 1, 0
mysql> SELECT 1 = 1, NULL = NULL, 1 = NULL;
-> 1, NULL, NULL
My understanding is that you are trying to join the ReferenceTable to the DataTable on Key1 & Key2, (Key1 is a direct match, and Key2 matches to the difference between Data3 and Data4) but you want to explicitly filter out rows where Data1 = -1. It seems that you may be wanting to also return a NULL if Data1 = -1, but what gets returned will depend on what's in your SELECT statement. We're talking about the Join here, and you're doing a Left Join which tells me you want all rows from myRerenceTable returned, whether of not there are matches in myDataTable. Having said that, I would write the Join as:
LEFT JOIN myReferenceTable
on myReferenceTable.Key1 = myDataTable.FKey1
and myDataTable.Data1 <> -1
AND myReferenceTable.Key2 = myDataTable.Data3 - myDataTable.Data4
my question is, is it possible to select certain rows in a table according to a comparison rule without removing anything from the result. To clarify what i want to to imagine following example.
i have a table with two values,
A | B | C
1 0 hey
1 1 there
2 1 this
3 0 is
3 1 a
4 0 test
now i want to select the rows that have a 0 in the B column, and an a in the C column without removing the results that don't have a 0 in column B but the same value in column A.
For that i could do a
select C from T where A in (select A from T where B = 0);
but isn't it possible to select all C values where column B contains a 0 and that match column A with those?
I'd gladly stand by if more information is needed since it is a quite fuzzy question, but SQL can be confusing sometimes.
Tough to tell without your example result set; but maybe something like this:
SELECT A, B, C
FROM myTable
WHERE (B = 0 AND C LIKE '%A%')
OR (B <> 0 AND B = A)
I think you just want an or condition:
select C
from MyTable
where b = 0 or A in (select A from T where B = 0)
Is this the version you want:
select C
from MyTable
where C = 'a' or A in (select A from T where B = 0)