Hive non-equi Join on OR condition - sql

I have two tables I want to join on 3 keys. However, one of the keys may contain a value that doesn't exist in the other table, but I still want to retain these records upon query.
Something similar to this where key_3 in the first_table may contain a value of 0 which does not exist in the second_table.
SELECT
f.key_1,
f.key_2,
f.key_3,
s.column_4
FROM
first_table f
LEFT OUTER JOIN second_table s
ON f.key_1 = s.key_1
AND f.key_2 = s.key_2
AND (f.key_3 = s.key_3 OR f.key_3 = 0)
When I run this I get an error of OR not supported in JOIN currently '0'. I know in newer versions of Hive this is allowed, but is there a workaround for it in older versions.

Move condition with OR to the WHERE clause:
SELECT
f.key_1,
f.key_2,
f.key_3,
s.column_4
FROM
first_table f
LEFT OUTER JOIN second_table s
ON f.key_1 = s.key_1
AND f.key_2 = s.key_2
WHERE f.key_3 = s.key_3
OR f.key_3 = 0
OR s.key_1 is NULL --allow not joined records

I may not fully understand your question, but doesn't a simple left join do what you want?
SELECT f.key_1, f.key_2, f.key_3, s.column_4
FROM first_table f LEFT OUTER JOIN
second_table s
ON f.key_1 = s.key_1 AND
f.key_2 = s.key_2 AND
f.key_3 = s.key_3 ;
If there is no match in the second table -- regardless of the values in the first table -- the rows from the first table are still returned.

Related

Why Hive SQL returning NULL values for a particular column in Select statement when that column has all double values?

I'm using Hive SQL. Version is Hive 1.1.0-cdh5.14.0. In my example below, sp.close is a column with type double values. I checked sp.column and there are definitely no NULL values. Yet, in this select statement below, sp.close shows all NULL values. Why?
select
step1.*,
sp.close
from
step1 left join stockprices2 sp on (
step1.symbol = sp.symbol and
step1.year = sp.year and
step1.startmonth = sp.month and
step1.startday = sp.day and
step1.sector = sp.sector
)
;
Most likely, your left join did not find a matchin row in stockprices2. In that event, the row from step1 is retained, but all columns from stockprices2 will be null in the resultset. This is by design how the database signals that the left join came up empty.
You can easily verify that by just chaning the left join to an inner join: you should have less rows returned (where there is no match in stockprices2, the row from step1 is removed from the resultset), and no null values in sp.close.
Or you can add one of the columns used in the left join conditions in the select clause, and see that it's null too.
select
st.*,
sp.close,
sp.symbol -- null too
from step1 st
left join stockprices2 sp
on st.symbol = sp.symbol
and st.year = sp.year
and st.startmonth = sp.month
and st.startday = sp.day
and st.sector = sp.sector
Side note: the parentheses around the join conditions are superfluous.

Left outer join does not select all equipment notifs

I want to select all notifications with the relevant information and I also want the notifications that have no equipment. But when I use below join, I only get the ones where the equipment is not null. Shouldn't the left outer join make sure I get everything in table VIQMEL?
I do get the notifications that have no equipment if I delete the AND K~SPRAS EQ 'E'.
Any ideas on how to resolve this?
SELECT v~qmnum,
v~qmart,
t~istat,
t~txt30,
v~aufnr,
v~tplnr,
v~equnr,
v~btpln,
v~qmnam,
v~qmgrp,
v~qmcod,
ct~kurztext,
gt~kurztext,
v~beber,
k~eqktx,
v~qmtxt,
ax~pltxt,
fx~pltxt,
v~priok,
v~erdat,
s~tdid,
a~reltype,
z~aduser
FROM viqmel AS v
LEFT OUTER JOIN iflot AS f ON v~tplnr = f~tplnr
LEFT OUTER JOIN jest AS j ON j~objnr = v~objnr
LEFT OUTER JOIN tj02t AS t ON t~istat = j~stat
LEFT OUTER JOIN iflotx AS fx ON fx~tplnr = v~tplnr
LEFT OUTER JOIN iflotx AS ax ON ax~tplnr = v~btpln
LEFT OUTER JOIN qpct AS ct ON ct~code = v~qmcod
LEFT OUTER JOIN eqkt AS k ON v~equnr = k~equnr
LEFT OUTER JOIN qpgt AS gt ON gt~codegruppe = v~qmgrp
LEFT OUTER JOIN stxh AS s ON s~tdname = v~qmnum
LEFT OUTER JOIN srgbtbrel AS a ON v~qmnum = a~instid_a
LEFT OUTER JOIN zzid_map AS Z ON v~qmnam = z~sapuser
WHERE t~spras = #sy-langu
AND v~qmnum LIKE #p_qmnum
AND v~equnr LIKE #p_equnr
AND v~qmnam LIKE #p_qmnam
AND v~aufnr LIKE #p_aufnr
AND f~tplnr LIKE #p_tplnr
AND t~istat LIKE #p_istat
AND v~beber LIKE #p_beber
AND j~inact <> #abap_true
AND t~istat <> 'I0076'
AND t~spras = 'E'
AND fx~spras = 'E'
AND k~spras = 'E'
INTO TABLE #DATA(et_notifs).
Side note:EQKT is equipment short text (not equipment) and EQKT~SPRAS is language.
Problem: You wrote your condition to only select English text, which is why it ignores records that are joined with non English or ones, that aren't joined at all.
So if you have ( number represents a key ) your text table
1 E ....
2 X ....
3 N ....
4 E ....
After a join texts from table join like this
1 E ....
2 [initial]
3 [initial]
4 E ....
After filter you're left with
1 E ....
4 E ....
Solutions
Unnecessarily complicated solution, using exclusion subquery
With restrictions of SAP Open SQL, excluding joins, as well as joins that including records based on absence of corresponding records from other tables is not possible.
The workarounds for excluding joins are generally sub-queries.
You could add a subquery to check select languages based on your filter and ignore that filter in other cases (to include empty records). Try to replace and K~SPRAS EQ 'E' with the following (the idea here is to take the language if it exists and bypass the condition otherwise):
and ( K~SPRAS in (select SPRAS from EQKT where EQUNR=V~EQUNR and spras = 'E')
OR NOT EXISTS (select SPRAS from EQKT where spras = 'E')
)
The idea here is you have 2 subqueries. One of them uses a positive check to include all the languages you need. The other uses a negative check and includes records where that particular language does not exist.
Update: Minimalistic solution (left join on key + condition)
After looking at your question with clear head, I noticed my solution might be too complicated for your needs (even though it will work).
A standard left join on key + condition will fulfill your requirement. Move your and K~SPRAS EQ 'E' into join condition and it will select exactly the way you want it to (A standard left join). Also, if I recall correctly outer keyword doesn't do anything on left/right joins.
LEFT JOIN EQKT AS K ON V~EQUNR EQ K~EQUNR AND K~SPRAS EQ 'E'
PS: Aliases and redundant joins in the question aren't helping with its readability.

SQL - How to put a condition for which table is selected without left join

I have a flag in a table which value ( 1 for US, or 2 for Global) indicates if the data will be in Table A or Table B.
A solution that works is to left join both tables; however this slows down significantly the scripts (from less than a second to over 15 seconds).
Is there any other clever way to do this? an equivalent of
join TableA only if TableCore.CountryFlag = "US"
join TableB only if TableCore.CountryFlag = "global"
Thanks a lot for the help.
You can try using this approach:
-- US data
SELECT
YourColumns
FROM
TableCore
INNER JOIN TableA AS T ON TableCore.JoinColumn = T.JoinColumn
WHERE
TableCore.CountryFlag = 'US'
UNION ALL
-- Non-US Data
SELECT
YourColumns -- These columns must match in number and datatype with previous SELECT
FROM
TableCore
INNER JOIN TableB AS T ON TableCore.JoinColumn = T.JoinColumn
WHERE
TableCore.CountryFlag = 'global'
However, if the result is still slow, you might want to check if the TableCore table has a index on CountryFlag and JoinColumn, and TableA and TableB an index on JoinColumn.
The basic structure is:
select . . ., coalesce(a.?, b.?) as ?
from tablecore c left join
tablea a
on c.? = a.? and c.countryflag = 'US' left join
tableb b
on c.? b.? and c.counryflag = 'global';
This version of the query can take advantage of indexes on tablea(?) and tableb(?).
If you have a complex query, this portion is probably not responsible for the performance problem.

Left outer join with 2 column missing some output rows

When I select all rows from table zvw_test it return 145 rows.
Table Customer_Class_Price have 160 rows.
When I try to join this 2 table with 2 condition it return 122 rows.
I don't understand why it not return all rows from zvw_test (145 rows)
becasue I use left outer join it should return all rows from left table.
Thank you.
SELECT zvw_test.Goods_ID,
zvw_test.Thai_Name,
zvw_test.UM,
zvw_test.CBal,
Customer_Class_Price.ListPrice
FROM zvw_test
LEFT OUTER JOIN
Customer_Class_Price ON zvw_test.Goods_ID = Customer_Class_Price.Goods_ID AND
zvw_test.UM = Customer_Class_Price.UM
WHERE (Customer_Class_Price.ClassCode = '444-666')
By putting one of your columns from the LEFT OUTER JOIN table in your WHERE clause, you have effectively turned it into an INNER JOIN. You need to move that up to the JOIN clause.
I had this problem before, I used a CTE to solve this, like:
WITH A AS
(
SELECT Customer_Class_Price.Goods_ID, Customer_Class_Price.UM, Customer_Class_Price.ListPrice
FROM Customer_Class_Price
WHERE Customer_Class_Price.ClassCode = '444-666'
)
SELECT zvw_test.Goods_ID, zvw_test.Thai_Name, zvw_test.UM, zvw_test.CBal, A.ListPrice
FROM zvw_test LEFT OUTER JOIN A
ON zvw_test.Goods_ID = A.Goods_ID AND zvw_test.UM = A.UM
You demand in your WHERE clause:
(Customer_Class_Price.ClassCode = '444-666')
Ergo you are not selecting rows where Customer_Class_Price.ClassCode IS NULL. Customer_Class_Price.ClassCode would be NULL if there is no corresponding row, but you are filtering those out explicitely.

Oracle left outer join, only want the null values

I'm working on a problem with two tables. Charge and ChargeHistory. I want to display a selection of columns from both tables where either the matching row in ChargeHistory has a different value and/or date from Charge or if there is no matching entry in ChargeHistory at all.
I'm using a left outer join declared using the ansi standard and while it does show the rows correctly where there is a difference, it isn't showing the null entries.
I've read that there can sometimes be issues if you are using the WHERE clause as well as the ON clause. However when I try and put all the conditons in the ON clause the query takes too long > 15 minutes (so long I have just cancelled the runs).
To make things worse both tables use a three part compound key.
Does anyone have any ideas as to why the null values are being left out?
SELECT values...
FROM bcharge charge
LEFT OUTER JOIN chgHist history
ON charge.key1 = history.key1 AND charge.key2 = history.key2 AND charge.key3 = history.key3 AND charge.chargeType = history.chargeType
WHERE charge.chargeType = '2'
AND (charge.value <> history.value OR charge.date <> history.date)
ORDER BY key1, key2, key
You probably want to explicitly select the null values:
SELECT values...
FROM bcharge charge
LEFT OUTER JOIN chgHist history
ON charge.key1 = history.key1 AND charge.key2 = history.key2 AND charge.key3 = history.key3 AND charge.chargeType = history.chargeType
WHERE charge.chargeType = '2'
AND ((charge.value <> history.value or history.value is null) OR (charge.date <> history.date or history.date is null))
ORDER BY key1, key2, key
You can explicitly look for a match in the where. I would recommend looking at one of the keys used for the join:
SELECT . . .
FROM bcharge charge LEFT OUTER JOIN
chgHist history
ON charge.key1 = history.key1 AND charge.key2 = history.key2 AND
charge.key3 = history.key3 AND charge.chargeType = history.chargeType
WHERE charge.chargeType = '2' AND
(charge.value <> history.value OR charge.date <> history.date OR history.key1 is null)
ORDER BY key1, key2, key;
The expressions charge.value <> history.value change the left outer join to an inner join because NULL results will be filtered out.
A WHERE clause filters the data returned by a join. Therefore when your inner table has null data for a particular column, the corresponding rows get filtered out based on your specified condition. That is why you should move that logic to the ON clause instead.
For the performance issues, you could consider adding indexes on the columns used for joining and filtering.
Have a look at this site, it will be very helpful for you, visual illustration of all the join statements with code samples
blog.codinghorror.com
Quoted of the relevant info in the above link:
SELECT * FROM TableA
LEFT OUTER JOIN TableB
ON TableA.name = TableB.name
Sample output:
id name id name
-- ---- -- ----
1 Pirate 2 Pirate
2 Monkey null null
3 Ninja 4 Ninja
4 Spaghetti null null
Left outer join
produces a complete set of records from Table A, with the matching records (where available) in Table B. If there is no match, the right side will contain null
For any field from an outer joined table used in the where clause you must also permit an IS NULL option for that same field, otherwise you negate the effect of the outer join and the result is the same as if you had used an inner join.
SELECT
*
FROM bcharge charge
LEFT OUTER JOIN chgHist history
ON charge.key1 = history.key1
AND charge.key2 = history.key2
AND charge.key3 = history.key3
AND charge.chargeType = history.chargeType
WHERE charge.chargeType = '2'
AND (
(charge.value <> history.value OR history.value IS NULL)
OR
(charge.date <> history.date OR history.date IS NULL)
)
ORDER BY
key1, key2, key3
Edit: Appears that this is the same query structure used by Rene above, so treat this one as in support of that please.