I am constructing a large table where I need to use CASE several places in the same select. The problem arises when I need to use an already selected column in a CASE in the same select. Example:
SELECT CASE WHEN (B.field_B1 + C.field_C1) > x THEN x ELSE 0 END AS field_1,
CASE WHEN field_1 > y THEN 'larger' ELSE 'smaller' END AS field 2,
CASE WHEN field_1 > z THEN 'taller' ELSE 'shorter' END AS field 3
FROM table_A AS A
INNER JOIN table_B as B on A.key_1 = B.KEY_1
LEFT OUTER JOIN table_C as C on A.key_1 = C.KEY_1
The table of interest consists of a few hundred columns and several million rows, so performance is a key issue here. CROSS APPLY is not supported in this case, and I would like to avoid WITH before the SELECT as this is a small part of a large script. Any creative ideas?
Related
I have the following SQL Server query:
SELECT TOP (100) PERCENT
dbo.cct_prod_plc_log_data.wc,
dbo.cct_prod_plc_log_data.loc,
dbo.cct_prod_plc_log_data.ord_no,
dbo.cct_prod_plc_log_data.ser_lot_no,
dbo.cct_prod_plc_log_data.line,
ISNULL(dbo.imlsmst_to_sfdtlfil.ItemNo, '') AS ItemNo,
ISNULL(dbo.imlsmst_to_sfdtlfil.BldSeqNo, '') AS BldSeqNo,
ISNULL(dbo.imlsmst_to_sfdtlfil.BldOrdNo, '') AS BldOrdNo,
ISNULL(dbo.imlsmst_to_sfdtlfil.StringItemNo, '') AS StringItemNo,
ISNULL(dbo.imlsmst_to_sfdtlfil.StringSerLotNo, '') AS StringSerLotNo,
MAX(dbo.cct_prod_plc_log_data.InsertDateTime) AS LatestDateTime,
MIN(ISNULL(dbo.cct_prod_plc_log_data.erp_transaction_id, 0)) AS MinimumErpID,
ISNULL(dbo.imlsmst_to_sfdtlfil.QtyOnHand, 0) AS QtyOnHand
FROM
dbo.cct_prod_plc_log_data
LEFT OUTER JOIN dbo.imlsmst_to_sfdtlfil
ON dbo.cct_prod_plc_log_data.ser_lot_no = dbo.imlsmst_to_sfdtlfil.SerLotNo
AND dbo.cct_prod_plc_log_data.ord_no = dbo.imlsmst_to_sfdtlfil.OrderNo
AND dbo.cct_prod_plc_log_data.line = dbo.imlsmst_to_sfdtlfil.Bin
WHERE
( dbo.cct_prod_plc_log_data.erp_transaction_id < 3 OR dbo.cct_prod_plc_log_data.erp_transaction_id IS NULL )
AND (dbo.cct_prod_plc_log_data.wc <> '')
AND (dbo.cct_prod_plc_log_data.loc <> '')
AND (dbo.cct_prod_plc_log_data.line <> '')
GROUP BY
dbo.cct_prod_plc_log_data.wc,
dbo.cct_prod_plc_log_data.loc,
dbo.cct_prod_plc_log_data.ord_no,
dbo.cct_prod_plc_log_data.ser_lot_no,
dbo.cct_prod_plc_log_data.line,
dbo.imlsmst_to_sfdtlfil.ItemNo,
dbo.imlsmst_to_sfdtlfil.BldSeqNo,
dbo.imlsmst_to_sfdtlfil.BldOrdNo,
dbo.imlsmst_to_sfdtlfil.StringItemNo,
dbo.imlsmst_to_sfdtlfil.StringSerLotNo,
dbo.imlsmst_to_sfdtlfil.QtyOnHand
ORDER BY dbo.cct_prod_plc_log_data.ord_no DESC
It contains a Left Outer Join between the two tables on 3 fields. Based on the current construction if any of the 3 Joined fields in the right table (dbo.imlsmst_to_sfdtlfil) are null or missing then the fields in the left query should return null.
How do I determine which of the 3 fields is the field that caused the join to fail? I would like to differentiate these from each other. Thanks.
(Ex. ser_lot_no and ord_no exists but bin is null vs bin and ord_no exist but ser_lot_no is null. )
Change it for an inner join and comment out all but one of the conditions, then uncomment them one at a time until the data disappears again - that's the faulty condition. If there was no data even with just one condition, that's the faulty condition:
SELECT
c.wc,
c.loc,
c.ord_no,
c.ser_lot_no,
c.line,
COALESCE(i.ItemNo, '') AS ItemNo,
COALESCE(i.BldSeqNo, '') AS BldSeqNo,
COALESCE(i.BldOrdNo, '') AS BldOrdNo,
COALESCE(i.StringItemNo, '') AS StringItemNo,
COALESCE(i.StringSerLotNo, '') AS StringSerLotNo,
MAX(c.InsertDateTime) AS LatestDateTime,
MIN(COALESCE(c.erp_transaction_id, 0)) AS MinimumErpID,
COALESCE(i.QtyOnHand, 0) AS QtyOnHand
FROM
dbo.cct_prod_plc_log_data c
INNER JOIN dbo.imlsmst_to_sfdtlfil i
ON
c.ser_lot_no = i.SerLotNo
--AND c.ord_no = i.OrderNo
--AND c.line = i.Bin
WHERE
( c.erp_transaction_id < 3 OR c.erp_transaction_id IS NULL )
AND (c.wc <> '')
AND (c.loc <> '')
AND (c.line <> '')
GROUP BY
c.wc,
c.loc,
c.ord_no,
c.ser_lot_no,
c.line,
COALESCE(i.ItemNo, ''),
COALESCE(i.BldSeqNo, '')
COALESCE(i.BldOrdNo, '')
COALESCE(i.StringItemNo, '')
COALESCE(i.StringSerLotNo, '')
COALESCE(i.QtyOnHand, 0)
ORDER BY c.ord_no DESC
Using INNER JOIN is more obvious than OUTER JOIN as most query tools give a row count and it's easier to see the row count changing from 99990 to 100000 than it is to eyeball 100000 rows looking for 10 that are null when they shouldn't be
If you have more than 2 tables, comment out your select block, put a *, and all but 2 tables:
SELECT *
/* columns,list,here,blah,blah */
FROM
table1
JOIN table2 ON ...
--JOIN table3 on ...
--JOIN table4 on ...
Run it, get the expected number of rows, then proceed uncommenting more and more tables. If at any point your row count changes unexpectedly (more when you expected less, or less when you expected more) investigate.
If the row count increases, it's probably a cartesian product and should be resolved by adding extra join conditions, not by whacking a DISTINCT in
Other top tips:
Use COALESCE rather than ISNULL; improve your database cross skilling
Alias tables and use the alias name, rather than repeating the schema and column name everywhere
GROUP BY the coalesced result rather than the column, if you're using a DB that draws a distinction between empty string and null string, otherwise you'll end up with two rows in your results when you expect 1
Edit: You said:
Thank you for the insight and tips. However, my problem was more so a question on how to incorporate the information on which field was causing the join to fail as a permanent addition rather than a one time audit. Any insight for that? –
And I say:
You can't feasibly do this, the database cannot tell you "which field" isn't working out because most of them that aren't working out. To see what I mean, run this:
SELECT
-- replace .id with the name of the pk column
CONCAT('Cannot join c[', c.id, '] to i[', i.id, '] because: ',
CASE
WHEN COALESCE(c.ser_lot_no, 'null') != COALESCE(i.SerLotNo, 'null ') THEN 'c.ser_lot_no != i.SerLotNo, '
END,
CASE
WHEN COALESCE(c.ord_no, 'null') != COALESCE(i.OrderNo, 'null ') THEN 'c.ord_no != i.OrderNo, '
END,
CASE
WHEN COALESCE(c.line, 'null') != COALESCE(i.Bin, 'null ') THEN 'c.line != i.Bin, '
END
)
FROM
dbo.cct_prod_plc_log_data c
CROSS JOIN dbo.imlsmst_to_sfdtlfil i
It asks the database to join every row to every other row and then look at the values on the row and work out whether it can be joined or not.. If Table c has 1000 rows and table i has 2000 rows (and each row in c matches at most 2 rows in i), you'll get a result set of 2 million rows, 1998000 of which are "can't join this row to that row because..."
A.id
1
2
3
B.id
3
4
5
The only row from A that joins with B is "3", and even then "3" from A doesn't join with 4 or 5 from B, and 3 from B doesn't join with 1 or 2 from A. For your single set of matched rows, you have 8 complaints that the rows don't match (3x3 rows total, minus one match)
So no, you can't feasibly ask a database to tell you which rows from this table didn't match which rows from that table because of condition X, because the answer is "nearly all of them didn't match" and "all" could be hundreds of millions
It gets marginally more feasible if you have some join columns that should work out all the time, and others that sometimes don't:
SELECT CASE WHEN a.something != b.other THEN 'this row would fail because something != other' END
FROM a JOIN b ON a.id = b.id --and a.something = b.other
But think about it for a second; relational databases are centered around the idea that data is related, and you can even enforce it with constraints: "don't allow row X to be inserted here unless it has an A and a B and a C value that is present in this other table's D and E and F columns"
That's what you should be using to ensure your joins work out (relational integrity), not allowing any old crap into the database and then trying to work out which rows might have joined to which other rows if only there wasn't some typo in column A that meant it didn't quite match up with D, even though B/C matched with E/F ..
I have a question on sql desgin.
Context:
I have a table called t_master and 13 other tables (lets call them a,b,c... for simplicity) where it needs to compared.
Logic:
t_master will be compared to table 'a' where t_master.gen_val =
a.value.
If record exist in t_master, retrieve t_master record, else retrieve 'a' record.
I do not need to retrieve the records if it exists in both tables (t_master and a) - XOR condition
Repeat this comparison with the remaining 12 tables.
I have some idea on doing this, using WITH to subquery the non-master tables (a,b,c...) first with their respective WHERE clause.
Then use XOR statement to retrieve the records.
Something like
WITH a AS (SELECT ...),
b AS (SELECT ...)
SELECT field1,field2...
FROM t_master FULL OUTER JOIN a FULL OUTER JOIN b FULL OUTER JOIN c...
ON t_master.gen_value = a.value
WHERE ((field1 = x OR field2 = y ) AND NOT (field1 = x AND field2 = y))
AND ....
.
.
.
.
Seeing that I have 13 tables that I need to full outer join, is there a better way/design to handle this?
Otherwise I would have at least 2*13 lines of WHERE clause which I'm not sure if that will have impact on the performance as t_master is sort of a log table.
**Assume I cant change any schema.
Currently I'm not sure if this SQL will working correctly yet, so I'm hoping someone can guide me in the right direction regarding this.
update from used_by_already's suggestion:
This is what I'm trying to do (comparison between 2 tables first, before I add more, but I am unable to get values from ATP_R.TBL_HI_HDR HI_HDR as it is in the NOT EXISTS subquery.
How do i overcome this?
SELECT LOG_REPO.UNIQ_ID,
LOG_REPO.REQUEST_PAYLOAD,
LOG_REPO.GEN_VAL,
LOG_REPO.CREATED_BY,
TO_CHAR(LOG_REPO.CREATED_DT,'DD/MM/YYYY') AS CREATED_DT,
HI_HDR.HI_NO R_VALUE,
HI_HDR.CREATED_BY R_CREATED_BY,
TO_CHAR(HI_HDR.CREATED_DT,'DD/MM/YYYY') AS R_CREATED_DT
FROM ATP_COMMON.VW_CMN_LOG_GEN_REPO LOG_REPO JOIN ATP_R.TBL_HI_HDR HI_HDR ON LOG_REPO.GEN_VAL = HI_HDR.HI_NO
WHERE NOT EXISTS
(SELECT NULL
FROM ATP_R.TBL_HI_HDR HI_HDR
WHERE LOG_REPO.GEN_VAL = HI_HDR.HI_NO
)
UNION ALL
SELECT LOG_REPO.UNIQ_ID,
LOG_REPO.REQUEST_PAYLOAD,
LOG_REPO.GEN_VAL,
LOG_REPO.CREATED_BY,
TO_CHAR(LOG_REPO.CREATED_DT,'DD/MM/YYYY') AS CREATED_DT,
HI_HDR.HI_NO R_VALUE,
HI_HDR.CREATED_BY R_CREATED_BY,
TO_CHAR(HI_HDR.CREATED_DT,'DD/MM/YYYY') AS R_CREATED_DT
FROM ATP_R.TBL_HI_HDR HI_HDR JOIN ATP_COMMON.VW_CMN_LOG_GEN_REPO LOG_REPO ON HI_HDR.HI_NO = LOG_REPO.GEN_VAL
WHERE NOT EXISTS
(SELECT NULL
FROM ATP_COMMON.VW_CMN_LOG_GEN_REPO LOG_REPO
WHERE HI_HDR.HI_NO = LOG_REPO.GEN_VAL
)
Full outer joins used to exclude all matching rows can be an expensive query. You don't supply much detail, but perhaps using NOT EXISTS would be simpler and maybe it will produce a better explain plan. Something along these lines.
select
cola,colb,colc
from t_master m
where not exists (
select null from a where m.keycol = a.fk_to_m
)
and not exists (
select null from b where m.keycol = b.fk_to_m
)
and not exists (
select null from c where m.keycol = c.fk_to_m
)
union all
select
cola,colb,colc from a
where not exists (
select null from t_master m where a.fk_to_m = m.keycol
)
union all
select
cola,colb,colc from b
where not exists (
select null from t_master m where b.fk_to_m = m.keycol
)
union all
select
cola,colb,colc from c
where not exists (
select null from t_master m where c.fk_to_m = m.keycol
)
You could union the 13 a,b,c ... tables to simplify the coding, but that may not perform so well.
In my project I need find difference task based on old and new revision in the same table.
id | task | latest_Rev
1 A N
1 B N
2 C Y
2 A Y
2 B Y
Expected Result:
id | task | latest_Rev
2 C Y
So I tried following query
Select new.*
from Rev_tmp nw with (nolock)
left outer
join rev_tmp old with (nolock)
on nw.id -1 = old.id
and nw.task = old.task
and nw.latest_rev = 'y'
where old.task is null
when my table have more than 20k records this query takes more time?
How to reduce the time?
In my company don't allow to use subquery
Use LAG function to remove the self join
SELECT *
FROM (SELECT *,
CASE WHEN latest_Rev = 'y' THEN Lag(latest_Rev) OVER(partition BY task ORDER BY id) ELSE NULL END AS prev_rev
FROM Rev_tmp) a
WHERE prev_rev IS NULL
My answer assumes
You can't change the indexes
You can't use subqueries
All fields are indexed separately
If you look at the query, the only value that really reduces the resultset is latest_rev='Y'. If you were to eliminate that condition, you'd definitely get a table scan. So we want that condition to be evaluated using an index. Unfortunately a field that just values 'Y' and 'N' is likely to be ignored because it will have terrible selectivity. You might get better performance if you coax SQL Server into using it anyway. If the index on latest_rev is called idx_latest_rev then try this:
Set transaction isolated level read uncommitted
Select new.*
from Rev_tmp nw with (index(idx_latest_rev))
left outer
join rev_tmp old
on nw.id -1 = old.id
and nw.task = old.task
where old.task is null
and nw.latest_rev = 'y'
latest_Rev should be a Bit type (boolean equivalent), i better for performance (Detail here)
May be can you add index on id, task
, latest_Rev columns
You can try this query (replace left outer by not exists)
Select *
from Rev_tmp nw
where nw.latest_rev = 'y' and not exists
(
select * from rev_tmp old
where nw.id -1 = old.id and nw.task = old.task
)
I have a pretty big MSSQL stored procedure that I need to conditionally check for certain IDs:
Select SomeColumns
From BigTable b
Join LotsOfTables l on b.LongStringField = l.LongStringField
Where b.SomeID in (1,2,3,4,5)
I wanted to conditionally check the SomeID field, so I did the following:
if #enteredText = 'This'
INSERT INTO #AwesomeIDs
VALUES(1),(2),(3)
if #enteredText = 'That'
INSERT INTO #AwesomeIDs
VALUES(4),(5)
Select SomeColumns
From BigTable b
Join LotsOfTables l on b.LongStringField = l.LongStringField
Where b.SomeID in (Select ID from #AwesomeIDs)
Nothing else has changed, yet I can't even get the latter query to grab 5 records. The top query returns 5000 records in less than 3 seconds. Why is selecting from a table variable so much drastically slower?
Two other possible options you can consider
Option 1
Select SomeColumns
From BigTable b
Join LotsOfTables l on b.LongStringField = l.LongStringField
Where
( b.SomeID IN (1,2,3) AND #enteredText = 'This')
OR
( b.SomeID IN (4,5) AND #enteredText = 'That')
Option 2
Select SomeColumns
From BigTable b
Join LotsOfTables l on b.LongStringField = l.LongStringField
Where EXISTS (Select 1
from #AwesomeIDs
WHERE b.SomeID = ID)
Mind you for Table variables , SQL Server always assumes there is only ONE row in the table (except sql 2014 , assumption is 100 rows) and it can affect the estimated and actual plans. But 1 row against 3 not really a deal breaker.
Is it possible to pull values from 2 different tables based on the value of a column? For example, I have a table with a boolean column that either returns 0 or 1 depending on what the end user selects in our program. 0 means that I should pull in the default values. 1 means to use the user's data.
If my table Table1 looked like this:
Case ID Boolean
====================
1 0
2 1
3 1
4 0
5 0
Then I would need to pull Case IDs 1,4,and 5's corresponding data from table Default and Case IDs 3 and 4's corresponding data from table UserDef. Then I would have to take these values, combine them, and reorder them by Case ID so I can preserve the order in the resulting table.
I am fairly inexperienced with SQL but I am trying to learn. Any help or suggestions are greatly appreciated. Thank you in advance for your help.
Something like this:
SELECT
t1.CaseID
,CASE WHEN t1.Boolean = 1 THEN dt.Col1 ELSE ut.Col1 END AS Col1
,CASE WHEN t1.Boolean = 1 THEN dt.Col2 ELSE ut.Col2 END AS Col2
FROM Table1 t1
LEFT JOIN DefaultTable dt ON dt.CaseID = t1.CaseID
LEFT JOIN UserDefTable ut ON ut.CaseID = t1.CaseID
ORDER BY t1.CaseID
You join on both tables and then use CASE in SELECT to choose from which one to display data.
Option B:
WITH CTE_Combo AS
(
SELECT 0 as Boolean, * FROM Default --replace * with needed columns
UNION ALL
SELECT 1 AS Boolean, * FROM UserDef --replace * with needed columns
)
SELECT * FROM Table1 t
LEFT JOIN CTE_Combo c ON t.CaseID = c.CaseID AND t.Boolean = c.Boolean
ORDER BY t.CaseID
This might be even simpler - using CTE make a union of both tables adding artificial column, and then join CTE and your Table using both ID and flag column.
SELECT t1.CaseID,
ISNULL(td.data, tu.data) userData -- pick data from table_default
-- if not null else from table_user
FROM table1 t1
LEFT JOIN table_default td ON t1.CaseID = td.CaseID -- left join with table_default
AND t1.Boolean = 0 -- when boolean = 0
LEFT JOIN table_user tu ON t1.CaseID = tu.CaseID -- left join with table_user
AND t1.Boolean = 1 -- when boolean = 1
ORDER BY t1.CaseID