Left Outer Join Not Working? - sql

I have a query pulling data from three tables using LEFT OUTER JOIN for both joins. I need the query to return the left most (Salesrep table) info even if the there is no corresponding data in the two right tables (prescriber and prescriptions, respectively). When I run this query without the date parameters in the WHERE clause, I get the expected return, but as soon as I include the date parameters I get nothing returned where there is no matching data for a salesrep. I need to at least see the salesrep table columns requested in the query.
Here is the query... any help is VERY much appreciated.
SELECT salesrep.salesrepid as SalesRepID,
salesrep.fname as SalesrepFName,
salesrep.lname as SalesRepLName,
salesrep.fname+' '+salesrep.lname as SalesRepFullName,
prescriber.dea_no as PDeaNo,
prescriber.lname+', '+prescriber.fname as DocName,
CONVERT(VARCHAR(8), prescriptions.filldate, 1) as FillDate,
prescriptions.drugname as DrugName,
prescriptions.daysupply as Supply,
prescriptions.qtydisp as QtyDisp,
prescriptions.rx_no as Refill,
prescriptions.copay as Sample,
ROUND(prescriptions.AgreedToPay-(prescriptions.AgreedToPay*.07),2) as AgreedToPay,
prescriptions.carrierid as CarrierID
FROM salesrep
LEFT OUTER JOIN prescriber on salesrep.salesrepid = prescriber.salesrepid
LEFT OUTER JOIN prescriptions on prescriber.dea_no = prescriptions.dea_no
WHERE salesrep.salesrepid = 143 AND
prescriptions.filldate >= '09-01-12' AND
prescriptions.filldate <= '09-17-12'
ORDER BY prescriptions.filldate

You should move the constraints on prescriptions.filldate into the ON condition of the join, and remove it from the where clause:
LEFT OUTER JOIN prescriptions ON prescriber.dea_no = prescriptions.dea_no
AND prescriptions.filldate >= '09-01-12'
AND prescriptions.filldate <= '09-17-12'
Otherwise, entries for which there are no prescriptions end up with nulls in prescriptions.filldate, and the WHERE clause throws them away.

Here you can find a brief description about query processing phases (it's common for most DBMSes). You will find out there, that for OUTER JOIN:
first CARTESIAN JOIN is produced,
than the ON condition is performed on result set producing subset of rows,
after than outer rows are appended with NULLs on inner table's joined columns,
on that result the WHERE clause is applied performing filtering.
When you place the condition within WHERE clause which touches outer tables rows they're all discarded. You should simply place that condition within the ON clause, as that one is evaluated before outer rows are appended.
So, those conditions:
prescriptions.filldate >= '09-01-12' AND
prescriptions.filldate <= '09-17-12'
should be moved into ON clause.

This fiddle can be useful for illustrating that:
a restriction placed in the ON clause is processed before the join, while a restriction placed in the WHERE clause is processed after the join.
Note that does not matter with inner joins, but it matters a lot with outer joins. More details in docs
Table t1
| num | name |
| --- | ---- |
| 1 | a |
| 2 | b |
| 3 | c |
Table t2
| num | value |
| --- | ----- |
| 1 | xxx |
| 3 | yyy |
| 5 | zzz |
Join condition in the ON clause
SELECT * FROM t1
LEFT JOIN t2 ON t1.num = t2.num AND t2.value = 'xxx';
| num | name | num | value |
| --- | ---- | --- | ----- |
| 1 | a | 1 | xxx |
| 2 | b | | |
| 3 | c | | |
Join condition in the WHERE clause
SELECT * FROM t1
LEFT JOIN t2 ON t1.num = t2.num
WHERE t2.value = 'xxx';
| num | name | num | value |
| --- | ---- | --- | ----- |
| 1 | a | 1 | xxx |
View on DB Fiddle

This is because your prescriptions.filldate inequalities are filtering out your salesrep rows that don't have a value in the prescriptions.filldate column.
So if there are null values (no matching data from the right tables), then the entire row, including the salesrep data is filtered out by the date filters - because the null doesn't fall between the two dates.

Related

Linking tables on multiple criteria

I've got myself in a bit of a mess on something I'm doing where I'm trying to get two tables linked together based on multiple bits of info.
I want to link one table to another based on the basic rules of(in this hierarchy)
where main linking is where orderid matches between the two tables
records from table 2 where valid=Y,
from those i want the valid records which has the highest seqn1 number and then from those the one that has the highest seqn2 value
table1
orderid | date | otherinfo
223344 | 22/10/2020 | okokkokokooeodijjf
table2
orderid | seqn1 | seqn2 | valid | additonaldata
223344 | 1 | 3 | y | sdfsfsf
223344 | 2 | 1 | y | sffferfr
223344 | 2 | 2 | y | sfrfrefr -- This row
223344 | 2 | 3 | n | rfrg66rr
223344 | 2 | 4 | n | adwere
223344 | 3 | 4 | n | adwere
so would want the final record to be
orderid | date | otherinfo | seqn1 | seqn2 | valid | additonaldata
223344 | 22/10/2020 | okokkokokooeodijjf | 2 | 2 | y | sfrfrefr
I started off with the code below but I'm not sure I'm doing it right and I can't seem to get it to pay attention to the valid flag when i try to add it in.
SELECT * FROM table1
left JOIN table2
ON table1.orderid = table2.orderid
AND table2.seqn1 = (SELECT MAX(table2.seqn1) FROM table2 WHERE table1.orderid = table2.orderid)
AND table2.seqn2 = (SELECT MAX(table2.seqn2) FROM table2 WHERE table1.orderid = table2.orderid
AND table2.seqn1 = (SELECT MAX(table2.seqn1) FROM table2 WHERE table1.orderid = table2.orderid))
Could someone help me amend the code please.
Use row_number analytic function with partition by orderid and order by SEQNRs in the order you need. No need for multiple subselects. To add more selections for the single row, use CASE to map your values to numbers and order by them also.
Fiddle here.
with l as (
select *,
rank() over(partition by orderid order by seqn1 desc, seqn2 desc) as rn
from line
where valid = 'y'
)
select *
from header as h
join l
on h.orderid = l.orderid
and l.rn = 1
How about something like this:
;
with cte_table2 as
(
SELECT ordered
,MAX(seqn1) as seqn1
,MAX(seqn2) as seqn2
FROM table2
where valid = 'y'
group by ordered --check if you need to add 'valid' to the group by but I don't think so.
)
SELECT
t1.*
,t3.otherinfo
--,t3.[OtherFields]
from table1 t1
inner join cte_table2 t2 on t1.orderid = t2.orderid -- first match on id
left join table2 t3 on t3.orderid = t2.orderid and t3.seqn1 = t2.seqn1 and t3.seqn2 = t2.seqn2

Create a flag in a left join

I am trying to create a new column (a sort of identifier flag) for the "Null" rows resulting of my following left join :
with CTE (...) as (
... unrelated code
) select * from CTE
left join (select columnID from table1) Pu
on CTE.columnID = Pu.columnID
left join (select case when bz.column2 is null then 'null test is working' else columnID2, column2 end FROM table2) Bz
ON CTE.columnID2 = Bz.columnID2
This code is working properly when I don't try to use a 'case when'. Actually, you could very well ignore the first left join.
My purpose would be to be able test the left join result while doing it, and act depending on the result :
If the left join result give a null row : creation of a flag column for the row,
If the left join result give a normal row : the left join is done normally, and the flag column is empty (as I suspect it cant be un existent).
I'd be glad if you could give me a hand!
EDIT : tables example:
CTE
| columnID | columnID2 | InformationsCTE |
| ab | mp | randominfo1 |
| ac | ma | randominfo2 |
| ae | me | randominfo3 |
| ad | mb | randominfo4 |
table2
| columnID2 | InformationsTable2 |
| mp | randominfo5 |
| ma | randominfo6 |
| me | randominfo7 |
Result after the second left join :
new CTE
| columnID | columnID2 | InformationsCTE | InformationsTable2| FLAG |
| ab | mp | randominfo1 | randominfo5 | OK |
| ac | ma | randominfo2 | randominfo6 | OK |
| ae | me | randominfo3 | randominfo7 | OK |
| ad | mb | randominfo4 | NULL | NOK |
Just use
T-SQL:
SELECT ISNULL(Column_to_check,'flag') FROM SomeTable
PL/SQL:
SELECT NVL(Column_to_check,'flag') FROM SomeTable
Also use NVL2 as below if you want to return other value from the Column_to_check:
NVL2(Column_to_check, value_if_NOT_null, value_if_null )
Would it not be more practical to SELECT this column, use the ISNULL operator and just use a straightforward LEFT JOIN? I feel like you're over-complicating it a bit.
Something like:
with CTE (...) as (
... unrelated code
)
SELECT CTE.*, NVL(bz.InformationsTable2, 'TEST OK')
FROM CTE
LEFT JOIN table2 Bz ON CTE.columnID2 = Bz.columnID2
EDIT: Based on your example table, if you join on the ID, then use NVL on the other column, it should work for you.
Here is an example I prepared for a previous question: SQL Fiddle
Example was build in mysql, so beware syntax, but logically it works the same way
Why the joins? It seems you only want to look up data in other table, for which you'd use EXISTS or IN:
with cte (...) as (
... unrelated code
)
select
cte.*,
case when columnid in (select columnid from table1) then 'okay' else 'fail' end as test1,
case when columnid2 in (select columnid2 from table2) then 'okay' else 'fail' end as test2
from cte;

Hive / SQL - Left join with fallback

In Apache Hive I have to tables I would like to left-join keeping all the data from the left data and adding data where possible from the right table.
For this I use two joins, because the join is based on two fields (a material_id and a location_id).
This works fine with two traditional left joins:
SELECT
a.*,
b.*
FROM a
INNER JOIN (some more complex select) b
ON a.material_id=b.material_id
AND a.location_id=b.location_id;
For the location_id the database only contains two distinct values, say 1 and 2.
We now have the requirement that if there is no "perfect match", this means that only the material_id can be joined and there is no correct combination of material_id and location_id (e.g. material_id=100 and location_id=1) for the join for the location_id in the b-table, the join should "default" or "fallback" to the other possible value of the location_id e.g. material_id=001 and location_id=2 and vice versa. This should only be the case for the location_id.
We have already looked into all possible answers also with CASE etc. but to no prevail. A setup like
...
ON a.material_id=b.material_id AND a.location_id=
CASE WHEN a.location_id = b.location_id THEN b.location_id ELSE ...;
we tried or did not figure out how really to do in hive query language.
Thank you for your help! Maybe somebody has a smart idea.
Here is some sample data:
Table a
| material_id | location_id | other_column_a |
| 100 | 1 | 45 |
| 101 | 1 | 45 |
| 103 | 1 | 45 |
| 103 | 2 | 45 |
Table b
| material_id | location_id | other_column_b |
| 100 | 1 | 66 |
| 102 | 1 | 76 |
| 103 | 2 | 88 |
Left - Join Table
| material_id | location_id | other_column_a | other_column_b
| 100 | 1 | 45 | 66
| 101 | 1 | 45 | NULL (mat. not in b)
| 103 | 1 | 45 | DEFAULT TO where location_id=2 (88)
| 103 | 2 | 45 | 88
PS: As stated here exists etc. does not work in the sub-query ON.
The solution is to left join without a.location_id = b.location_id and number all rows in order of preference. Then filter by row_number. In the code below the join will duplicate rows first because all matching material_id will be joined, then row_number() function will assign 1 to rows where a.location_id = b.location_id and 2 to rows where a.location_id <> b.location_id if exist also rows where a.location_id = b.location_id and 1 if there are not exist such. b.location_id added to the order by in the row_number() function so it will "prefer" rows with lower b.location_id in case there are no exact matching. I hope you have caught the idea.
select * from
(
SELECT
a.*,
b.*,
row_number() over(partition by material_id
order by CASE WHEN a.location_id = b.location_id THEN 1 ELSE 2 END, b.location_id ) as rn
FROM a
LEFT JOIN (some more complex select) b
ON a.material_id=b.material_id
)s
where rn=1
;
Maybe this is helpful for somebody in the future:
We also came up with a different approach.
First, we create another table to calculate averages from the table b based on material_id over all (!) locations.
Second, In the join table we create three columns:
c1 - the value where material_id and location_id are matching (result from a left join of table a with table b). This column is null if there is no perfect match.
c2 - the value from the table where we write the number from the averages (fallback) table for this material_id (regardless of the location)
c3 - the "actual value" column where we use a case statement to decide if when the column 1 is NULL (there is no perfect match of material and location) then we use the value from column 2 (the average over all the other locations for the material) for the further calculations.

Remove NULL values from multiple LEFT JOIN in sql server

I have the following tables
ITEM1
ID | NAME | GEARS | ITEM2_ID |
-------------------------------
1 | Test | 56 | 4 |
2 | Test2| 12 | 2 |
ITEM3
ID | NAME | DATA | ITEM2_ID |
-------------------------------
1 | Test | 1 | 1 |
2 | Test7| 22 | 3 |
ITEM2
ID | VALUE |
--------------------
1 | is simple |
2 | is hard |
3 | is different|
4 | is good |
5 | very good |
And my query
SELECT TOP(3) * FROM (
SELECT ID,
rankTable.RANK as RANK_,
TOTALROWS = COUNT(*) OVER()
FROM ITEM2
INNER JOIN
CONTAINSTABLE(ITEM2, [VALUE], 'ISABOUT("good")') as rankTable
ON ITEM2.ID = rankTable.[KEY]
) as ITEM2table
LEFT JOIN (
SELECT ID,
NAME,
GEARS,
ITEM2_ID
FROM ITEM1
) as ITEM1table
ON ITEM1table.ITEM2_ID = ITEM2table.ID
LEFT JOIN (
SELECT ID,
NAME,
DATA,
ITEM2_ID
FROM ITEM3
) as ITEM3table
ON ITEM3table.ITEM2_ID = ITEM2table.ID
and the results
How to remove (if is possible) the first row (ID = 5) using the above SQL query ? Also I want to show TOTALROWS = 1 because other row contains NULL's except first 3 columns.
Thank you.
If I understand correctly, you want to keep only the rows where either the first or the second (or both) outer join succeeds:
WHERE ITEM1table.ITEM2_ID IS NOT NULL
OR ITEM3table.ITEM2_ID IS NOT NULL
Some simplifications can be done on the query. No need for the nested subqueries:
SELECT TOP(3)
ITEM2table.ID,
rankTable.RANK as RANK_,
TOTALROWS = COUNT(*) OVER(),
ITEM1table.*,
ITEM3table.*
FROM
ITEM2
INNER JOIN
CONTAINSTABLE(ITEM2, [VALUE], 'ISABOUT("good")') as rankTable
ON ITEM2.ID = rankTable.[KEY]
LEFT JOIN
ITEM1 as ITEM1table
ON ITEM1table.ITEM2_ID = ITEM2.ID
LEFT JOIN
ITEM3 as ITEM3table
ON ITEM3table.ITEM2_ID = ITEM2.ID
WHERE ITEM1table.ITEM2_ID IS NOT NULL
OR ITEM3table.ITEM2_ID IS NOT NULL
ORDER BY something --- you need to order by something
--- if you use TOP. Unless you want
--- 3 (random) rows.
Maybe there's an obvious reason, but if you want to eliminate rows where the second table doesn't have a match, why are you using a left join? It seems like your first join should be an inner join and your second should be left - that would give you the results you want in this case.
You can either use INNER JOIN instead of LEFT JOIN, or put
WHERE ITEM1table.ID IS NOT NULL AND ITEM3table.ID IS NOT NULL
at the end of your query

SQL return max value from child for each parent row

I have 2 tables - 1 with parent records, 1 with child records. For each parent record, I'm trying to return a single child record with the MAX(SalesPriceEach).
Additionally I'd like to only return a value when there is more than 1 child record.
parent - SalesTransactions table:
+-------------------+---------+
|SalesTransaction_ID| text |
+-------------------+---------+
| 1 | Blah |
| 2 | Blah2 |
| 3 | Blah3 |
+-------------------+---------+
child - SalesTransactionLines table
+--+-------------------+---------+--------------+
|id|SalesTransaction_ID|StockCode|SalesPriceEach|
+--+-------------------+---------+--------------+
| 1| 1 | 123 | 99 |
| 2| 1 | 35 | 50 |
| 3| 2 | 15 | 75 |
+--+-------------------+---------+--------------+
desired results
+-------------------+---------+--------------+
|SalesTransaction_ID|StockCode|SalesPriceEach|
+-------------------+---------+--------------+
| 1 | 123 | 99 |
| 2 | 15 | 75 |
+-------------------+---------+--------------+
I found a very similar question here, and based my query on the answer but am not seeing the results I expect.
WITH max_feature AS (
SELECT c.StockCode,
c.SalesTransaction_ID,
MAX(c.SalesPriceEach) as feature
FROM SalesTransactionLines c
GROUP BY c.StockCode, c.SalesTransaction_ID)
SELECT p.SalesTransaction_ID,
mf.StockCode,
mf.feature
FROM SalesTransactions p
LEFT JOIN max_feature mf ON mf.SalesTransaction_ID = p.SalesTransaction_ID
The results from this query are returning multiple rows for each parent, and not even the highest value first!
select stl.SalesTransaction_ID, stl.StockCode, ss.MaxSalesPriceEach
from SalesTransactionLines stl
inner join
(
select stl2.SalesTransaction_ID, max(stl2.SalesPriceEach) MaxSalesPriceEach
from SalesTransactionLines stl2
group by stl2.SalesTransaction_ID
having count(*) > 1
) ss on (ss.SalesTransaction_ID = stl.SalesTransaction_ID and
ss.MaxSalesPriceEach = stl.SalesPriceEach)
OR, alternatively:
SELECT stl1.*
FROM SalesTransactionLines AS stl1
LEFT OUTER JOIN SalesTransactionLines AS stl2
ON (stl1.SalesTransaction_ID = stl2.SalesTransaction_ID
AND stl1.SalesPriceEach < stl2.SalesPriceEach)
WHERE stl2.SalesPriceEach IS NULL;
I know I'm a year late to this party but I always prefer using Row_Number in these situations. It solves the problem when there are two rows that meet your Max criteria and makes sure that only one row is returned:
with z as (
select
st.SalesTransaction_ID
,row=ROW_NUMBER() OVER(PARTITION BY st.SalesTransaction_ID ORDER BY stl.SalesPriceEach DESC)
,stl.StockCode
,stl.SalesPriceEach
from
SalesTransactions st
inner join SalesTransactionLines stl on stl.SalesTransaction_ID = st.SalesTransaction_ID
)
select * from z where row = 1
SELECT SalesTransactions.SalesTransaction_ID,
SalesTransactionLines.StockCode,
MAX(SalesTransactionLines.SalesPriceEach)
FROM SalesTransactions RIGHT JOIN SalesTransactionLines
ON SalesTransactions.SalesTransaction_ID = SalesTransactionLines.SalesTransaction_ID
GROUP BY SalesTransactions.SalesTransaction_ID, alesTransactionLines.StockCode;
select a.SalesTransaction_ID, a.StockCode, a.SalesPriceEach
from SalesTransacions as a
inner join (select SalesTransaction_ID, MAX(SalesPriceEach) as SalesPriceEach
from SalesTransactionLines group by SalesTransaction_ID) as b
on a.SalesTransaction_ID = b.SalesTransaction_ID
and a.SalesPriceEach = b.SalesPriceEach
subquery returns table with trans ids and their maximums so just join it with transactions table itself by those 2 values