I'm working on a problem with two tables. Charge and ChargeHistory. I want to display a selection of columns from both tables where either the matching row in ChargeHistory has a different value and/or date from Charge or if there is no matching entry in ChargeHistory at all.
I'm using a left outer join declared using the ansi standard and while it does show the rows correctly where there is a difference, it isn't showing the null entries.
I've read that there can sometimes be issues if you are using the WHERE clause as well as the ON clause. However when I try and put all the conditons in the ON clause the query takes too long > 15 minutes (so long I have just cancelled the runs).
To make things worse both tables use a three part compound key.
Does anyone have any ideas as to why the null values are being left out?
SELECT values...
FROM bcharge charge
LEFT OUTER JOIN chgHist history
ON charge.key1 = history.key1 AND charge.key2 = history.key2 AND charge.key3 = history.key3 AND charge.chargeType = history.chargeType
WHERE charge.chargeType = '2'
AND (charge.value <> history.value OR charge.date <> history.date)
ORDER BY key1, key2, key
You probably want to explicitly select the null values:
SELECT values...
FROM bcharge charge
LEFT OUTER JOIN chgHist history
ON charge.key1 = history.key1 AND charge.key2 = history.key2 AND charge.key3 = history.key3 AND charge.chargeType = history.chargeType
WHERE charge.chargeType = '2'
AND ((charge.value <> history.value or history.value is null) OR (charge.date <> history.date or history.date is null))
ORDER BY key1, key2, key
You can explicitly look for a match in the where. I would recommend looking at one of the keys used for the join:
SELECT . . .
FROM bcharge charge LEFT OUTER JOIN
chgHist history
ON charge.key1 = history.key1 AND charge.key2 = history.key2 AND
charge.key3 = history.key3 AND charge.chargeType = history.chargeType
WHERE charge.chargeType = '2' AND
(charge.value <> history.value OR charge.date <> history.date OR history.key1 is null)
ORDER BY key1, key2, key;
The expressions charge.value <> history.value change the left outer join to an inner join because NULL results will be filtered out.
A WHERE clause filters the data returned by a join. Therefore when your inner table has null data for a particular column, the corresponding rows get filtered out based on your specified condition. That is why you should move that logic to the ON clause instead.
For the performance issues, you could consider adding indexes on the columns used for joining and filtering.
Have a look at this site, it will be very helpful for you, visual illustration of all the join statements with code samples
blog.codinghorror.com
Quoted of the relevant info in the above link:
SELECT * FROM TableA
LEFT OUTER JOIN TableB
ON TableA.name = TableB.name
Sample output:
id name id name
-- ---- -- ----
1 Pirate 2 Pirate
2 Monkey null null
3 Ninja 4 Ninja
4 Spaghetti null null
Left outer join
produces a complete set of records from Table A, with the matching records (where available) in Table B. If there is no match, the right side will contain null
For any field from an outer joined table used in the where clause you must also permit an IS NULL option for that same field, otherwise you negate the effect of the outer join and the result is the same as if you had used an inner join.
SELECT
*
FROM bcharge charge
LEFT OUTER JOIN chgHist history
ON charge.key1 = history.key1
AND charge.key2 = history.key2
AND charge.key3 = history.key3
AND charge.chargeType = history.chargeType
WHERE charge.chargeType = '2'
AND (
(charge.value <> history.value OR history.value IS NULL)
OR
(charge.date <> history.date OR history.date IS NULL)
)
ORDER BY
key1, key2, key3
Edit: Appears that this is the same query structure used by Rene above, so treat this one as in support of that please.
Related
I have two tables I want to join on 3 keys. However, one of the keys may contain a value that doesn't exist in the other table, but I still want to retain these records upon query.
Something similar to this where key_3 in the first_table may contain a value of 0 which does not exist in the second_table.
SELECT
f.key_1,
f.key_2,
f.key_3,
s.column_4
FROM
first_table f
LEFT OUTER JOIN second_table s
ON f.key_1 = s.key_1
AND f.key_2 = s.key_2
AND (f.key_3 = s.key_3 OR f.key_3 = 0)
When I run this I get an error of OR not supported in JOIN currently '0'. I know in newer versions of Hive this is allowed, but is there a workaround for it in older versions.
Move condition with OR to the WHERE clause:
SELECT
f.key_1,
f.key_2,
f.key_3,
s.column_4
FROM
first_table f
LEFT OUTER JOIN second_table s
ON f.key_1 = s.key_1
AND f.key_2 = s.key_2
WHERE f.key_3 = s.key_3
OR f.key_3 = 0
OR s.key_1 is NULL --allow not joined records
I may not fully understand your question, but doesn't a simple left join do what you want?
SELECT f.key_1, f.key_2, f.key_3, s.column_4
FROM first_table f LEFT OUTER JOIN
second_table s
ON f.key_1 = s.key_1 AND
f.key_2 = s.key_2 AND
f.key_3 = s.key_3 ;
If there is no match in the second table -- regardless of the values in the first table -- the rows from the first table are still returned.
I'm using Hive SQL. Version is Hive 1.1.0-cdh5.14.0. In my example below, sp.close is a column with type double values. I checked sp.column and there are definitely no NULL values. Yet, in this select statement below, sp.close shows all NULL values. Why?
select
step1.*,
sp.close
from
step1 left join stockprices2 sp on (
step1.symbol = sp.symbol and
step1.year = sp.year and
step1.startmonth = sp.month and
step1.startday = sp.day and
step1.sector = sp.sector
)
;
Most likely, your left join did not find a matchin row in stockprices2. In that event, the row from step1 is retained, but all columns from stockprices2 will be null in the resultset. This is by design how the database signals that the left join came up empty.
You can easily verify that by just chaning the left join to an inner join: you should have less rows returned (where there is no match in stockprices2, the row from step1 is removed from the resultset), and no null values in sp.close.
Or you can add one of the columns used in the left join conditions in the select clause, and see that it's null too.
select
st.*,
sp.close,
sp.symbol -- null too
from step1 st
left join stockprices2 sp
on st.symbol = sp.symbol
and st.year = sp.year
and st.startmonth = sp.month
and st.startday = sp.day
and st.sector = sp.sector
Side note: the parentheses around the join conditions are superfluous.
The code below is supposed to return unique records in the lp_num field from the subquery to then be used in the outer query, but I am still getting multiples of the lp_num field. A ReferenceNumber can have multiple ApptDate records, but each lp_num can only have 1 rf_num. That's why I tried to retrieve unique lp_num records all the way down in the subquery, but it doesn't work. I am using Report Builder 3.0.
Current Output
Screenshot
The desired output would be to have only unique records in the lp_num field. This is because each value in the lp_num field is a pallet, one single pallet. the info to the right is when it arrived (ApptDate) and what the reference number is for the delivery (ref_num). Therefore, it makes no sense for a pallet to have multiple receipt dates...it can only arrive once...
SELECT DISTINCT
dbo.ISW_LPTrans.item,
dbo.ISW_LPTrans.lot,
dbo.ISW_LPTrans.trans_type,
dbo.ISW_LPTrans.lp_num,
dbo.ISW_LPTrans.ref_num,
(MIN(CONVERT(VARCHAR(10),dbo.CW_CheckInOut.ApptDate,101))) as appt_date_only,
dbo.CW_CheckInOut.ApptTime,
dbo.item.description,
dbo.item.u_m,
dbo.ISW_LPTrans.qty,
(CASE
WHEN dbo.ISW_LPTrans.trans_type = 'F'
THEN 'Produced internally'
ELSE
(CASE
WHEN dbo.ISW_LPTrans.trans_type = 'R'
THEN 'Received from outside'
END)
END
) as original_source
FROM
dbo.ISW_LPTrans
INNER JOIN dbo.CW_Dock_Schedule ON LTRIM(RTRIM(dbo.ISW_LPTrans.ref_num)) = dbo.CW_Dock_Schedule.ReferenceNumber
INNER JOIN dbo.CW_CheckInOut ON dbo.CW_CheckInOut.TruckID = dbo.CW_Dock_Schedule.TruckID
INNER JOIN dbo.item ON dbo.item.item = dbo.ISW_LPTrans.item
WHERE
(dbo.ISW_LPTrans.trans_type = 'R') AND
--CONVERT(VARCHAR(10),dbo.CW_CheckInOut.ApptDate,101) <= CONVERT(VARCHAR(10),dbo.ISW_LPTrans.trans_date,101) AND
dbo.ISW_LPTrans.lp_num IN
(SELECT DISTINCT
dbo.ISW_LPTrans.lp_num
FROM
dbo.ISW_LPTrans
INNER JOIN dbo.item ON dbo.ISW_LPTrans.item = dbo.item.item
INNER JOIN dbo.job ON dbo.ISW_LPTrans.ref_num = dbo.job.job AND dbo.ISW_LPTrans.ref_line_suf = dbo.job.suffix
WHERE
(dbo.ISW_LPTrans.trans_type = 'W' OR dbo.ISW_LPTrans.trans_type = 'I') AND
dbo.ISW_LPTrans.ref_num IN
(SELECT
dbo.ISW_LPTrans.ref_num
FROM
dbo.ISW_LPTrans
--INNER JOIN dbo.ISW_LPTrans on dbo.ISW_LPTrans.
WHERE
dbo.ISW_LPTrans.item LIKE #item AND
dbo.ISW_LPTrans.lot LIKE #lot AND
dbo.ISW_LPTrans.trans_type = 'F'
GROUP BY
dbo.ISW_LPTrans.ref_num
) AND
dbo.ISW_LPTrans.ref_line_suf IN
(SELECT
dbo.ISW_LPTrans.ref_line_suf
FROM
dbo.ISW_LPTrans
--INNER JOIN dbo.ISW_LPTrans on dbo.ISW_LPTrans.
WHERE
dbo.ISW_LPTrans.item LIKE #item AND
dbo.ISW_LPTrans.lot LIKE #lot AND
dbo.ISW_LPTrans.trans_type = 'F'
GROUP BY
dbo.ISW_LPTrans.ref_line_suf
)
GROUP BY
dbo.ISW_LPTrans.lp_num
HAVING
SUM(dbo.ISW_LPTrans.qty) < 0
)
GROUP BY
dbo.ISW_LPTrans.item,
dbo.ISW_LPTrans.lot,
dbo.ISW_LPTrans.trans_type,
dbo.ISW_LPTrans.lp_num,
dbo.ISW_LPTrans.ref_num,
dbo.CW_CheckInOut.ApptDate,
dbo.CW_CheckInOut.ApptTime,
dbo.item.description,
dbo.item.u_m,
dbo.ISW_LPTrans.qty
ORDER BY
dbo.ISW_LPTrans.lp_num
In a nutshell - the way you use DISTINCT is logically wrong from SQL perspective.
Your DISTINCT is in an IN subquery in the WHERE clause - and at that point of code it has absolutely no effect (except from the performance penalty). Think on it - if the outer query returns non-unique values of dbo.ISW_LPTrans.lp_num (which obvioulsy happens) those values can still be within the distinct values of the IN subquery - the IN does not enforce a 1-to-1 match, it only enforces the fact that the outer query values are within the inner values, but they can match multiple times. So it is definitely not DISTINCT's fault.
I would go through the following check steps:
See if there is insufficient JOIN ON condition(s) in the outer FROM section that leads to data multiplication (e.g. if a table has primary-to-foreign key relation on several columns, but you join on one of them only etc.).
Check which of the sources contains non-distinct records in the outer FROM section - then either cleanse your source, or adjust the JOIN condition and / or the WHERE clause so that you only pick distinct & correct records. In fact you might need to SELECT DISTINCT in the FROM sections - there it would make much more sense.
I have the following SQL query, and need to figure out where the "signatures" data is actually being read from. It's not from the 'claims' table, and doesn't seem to be from the 'questionnaire_answers' table. I believe it will be a boolean value, if that helps at all.
I'm reasonably proficient at SQL, but the joins have left me a bit confused.
(There's some PHP, but it's not relevant to the question).
$SQL="SELECT surveyor, COUNT(signed_total) AS 'total', SUM(signed_total) AS 'signed_total' FROM (
SELECT DISTINCT claims.claim_id, CONCAT(surveyors.user_first_name, CONCAT(' ', surveyors.user_surname)) AS 'surveyor', CASE WHEN signatures.claim_id IS NOT NULL THEN 1 ELSE 0 END AS 'signed_total' FROM claims
INNER JOIN users surveyors ON claims.surveyor_id = surveyors.user_id
LEFT OUTER JOIN signatures ON claims.claim_id = signatures.claim_id
INNER JOIN questionnaire_answers ON questionnaire_answers.claim_id = claims.claim_id
WHERE (claims.claim_type <> ".$conn->qstr(TYPE_DESKTOP).")
AND (claims.claim_type <> ".$conn->qstr(TYPE_AUDIT).")
AND (claims.claim_cancelled_id <= 0)
AND (claims.date_completed BETWEEN '".UK2USDate($start_date)." 00:00:00' AND '".UK2USDate($end_date)." 23:59:59')
) AS tmp
GROUP BY surveyor
ORDER BY surveyor ASC
";
Thank you!
signatures is a table (see LEFT OUTER JOIN signatures in your query).
As written in FROM clause :
FROM claims
INNER JOIN users surveyors ON claims.surveyor_id = surveyors.user_id
LEFT OUTER JOIN signatures ON claims.claim_id = signatures.claim_id
The LEFT keyword means that the rows of the left table are preserved; So all rows from claims table are considered and NULL marks are added as placeholders for the attributes from the nonpreserved side of the join which is signatures table here.
So CASE WHEN signatures.claim_id IS NOT NULL THEN 1 ELSE 0 END AS 'signed_total' basically checks that if a match between these two tables exists based on claim_id then signed_total column should have value 1 else 0.
Hope that helps!!
I was using this query to get results,
SELECT *
FROM TRXN_REPORT CNT,
TBL_CUSTINFO CUSTADDINFO,
TBL_DEAL_EVT EVENT,
TBL_MISCADDR CIFMISC
WHERE CNT.INTERNAL_REF_NUM = EVENT.INTERNAL_REF_NUM
AND CNT.BRANCH = EVENT.BRANCH
AND CNT.QFXVERSION = EVENT.QFXVERSION
AND CNT.VALUEDATE = EVENT.VALUEDATE
AND CNT.LIQ_STATUS = 'L'
AND EVENT.EVENTCODE = 'LLIQ' -- Liquidation Event
AND CUSTADDINFO.CUSTOMER_NO = CNT.CUST_NUM
AND CIFMISC.CUSTOMER_NUMBER = CNT.CUST_NUM
AND CUSTADDINFO.REC_STATUS ='O'
AND CIFMISC.REC_STATUS = 'O'
AND MY_PBG_CUSTOMER = 'Y'
AND (BUYCCYCODE = 'USD' OR SELLCCYCODE ='USD')
here this condition
AND CIFMISC.CUSTOMER_NUMBER = CNT.CUST_NUM
Is fetching only those records which are present both in TBL_MISCADDR and TRXN_REPORT table, now as a change of requirment I want to perform left outer join so as to get records which are not there in TBL_MISCADDR table but present in TRXN_BATCHREPORT_WRK. How to perform this along with maintaining all the where conditions.
You have to you specify the condition for the join in the ON clause of the JOIN. So, your current conditions in the WHERE clause should be separated into the conditions for joining the tables (and these moved to ON clause), and the rest for selecting the lines - and these should be left in the WHERE clause. Example (don't know if I have it correct according to your data but you will get the idea):
SELECT *
FROM TRXN_REPORT CNT
LEFT JOIN TBL_DEAL_EVT EVENT ON
CNT.INTERNAL_REF_NUM = EVENT.INTERNAL_REF_NUM
AND CNT.BRANCH = EVENT.BRANCH
AND CNT.QFXVERSION = EVENT.QFXVERSION
AND CNT.VALUEDATE = EVENT.VALUEDATE
LEFT JOIN SOME_OTHER_TABLE T ON
...
WHERE
AND CUSTADDINFO.REC_STATUS ='O'
AND CIFMISC.REC_STATUS = 'O'
AND MY_PBG_CUSTOMER = 'Y'
AND (BUYCCYCODE = 'USD' OR SELLCCYCODE ='USD')
....
This way you can safely use LEFT JOIN and the lines for which the records doesn't exist in the other table will also be included.
... to get records which are not there in TBL_MISCADDR table but
present in TRXN_BATCHREPORT_WRK.
That would be done using TRXN_BATCHREPORT_WRK LEFT JOIN TBL_MISCADDR ON <condition>.