LEFT OUTER JOIN with 'field IS NULL' in WHERE works as INNER JOIN - sql

Today I've faced some unexplainable (for me) behavior in PostgreSQL — LEFT OUTER JOIN does not return records for main table (with nulls for joined one fields) in case the joined table fields are used in WHERE expression.
To make it easier to grasp the case details, I'll provide an example. So, let's say we have 2 tables: item with some goods, and price, referring item, with prices for the goods in different years:
CREATE TABLE item(
id INTEGER PRIMARY KEY,
name VARCHAR(50)
);
CREATE TABLE price(
id INTEGER PRIMARY KEY,
item_id INTEGER NOT NULL,
year INTEGER NOT NULL,
value INTEGER NOT NULL,
CONSTRAINT goods_fk FOREIGN KEY (item_id) REFERENCES item(id)
);
The table item has 2 records (TV set and VCR items), and the table price has 3 records, a price for TV set in years 2000 and 2010, and a price for VCR for year 2000 only:
INSERT INTO item(id, name)
VALUES
(1, 'TV set'),
(2, 'VCR');
INSERT INTO price(id, item_id, year, value)
VALUES
(1, 1, 2000, 290),
(2, 1, 2010, 270),
(3, 2, 2000, 770);
-- no price of VCR for 2010
Now let's make a LEFT OUTER JOIN query, to get prices for all items for year 2010:
SELECT
i.*,
p.year,
p.value
FROM item i
LEFT OUTER JOIN price p ON i.id = p.item_id
WHERE p.year = 2010 OR p.year IS NULL;
For some reason, this query will return a results only for TV set, which has a price for this year. Record for VCR is absent in results:
id | name | year | value
----+--------+------+-------
1 | TV set | 2010 | 270
(1 row)
After some experimenting, I've found a way to make the query to return results I need (all records for item table, with nulls in the fields of joined table in case there are no mathing records for the year. It was achieved by moving year filtering into a JOIN condition:
SELECT
i.*,
p.year,
p.value
FROM item i
LEFT OUTER JOIN (
SELECT * FROM price
WHERE year = 2010 -- <= here I filter a year
) p ON i.id = p.item_id;
And now the result is:
id | name | year | value
----+--------+------+-------
1 | TV set | 2010 | 270
2 | VCR | |
(2 rows)
My main question is — why the first query (with year filtering in WHERE) does not work as expected, and turns instead into something like INNER JOIN?
I'm severely blocked by this issue on my current project, so I'll be thankful about tips/hints on the next related questions too:
Are there any other options to achieve the proper results?
... especially — easily translatable to Django's ORM queryset?
Upd: #astentx suggested to move filtering condition directly into JOIN (and it works too):
SELECT
i.*,
p.year,
p.value
FROM item i
LEFT OUTER JOIN price p
ON
i.id = p.item_id
AND p.year = 2010;
Though, the same as my first solution, I don't see how to express it in terms of Django ORM querysets. Are there any other suggestions?

The first query does not work as expected because expectation is wrong. It does not work as INNER JOIN as well. The query returns a record for VCR only if there is no price for VCR at all.

SELECT
i.*,
y.year,
p.value
FROM item i
CROSS JOIN (SELECT 2010 AS year) y -- here could be a table
LEFT OUTER JOIN price p
ON (p.item_id = i.id
AND p.year = y.year);

Related

Return stock in a warehouse even if there is no row for that given stock

I have 5 different tables:
Toasters: product name (foreign key to products and primary key), slots, serial
Microwaves: product name (same as toaster), wattage
Products: product name (primary key)
Stock: product (fk to product), warehouse (fk to warehouse), amount
Warehouse: name (primary key)
toasters and microwaves are child tables of products (although its not using postgres inheritance, since there are issues with it). They represent different models of toasters (simplified to just slots and wattage here). Every toaster and microwave has exactly 1 entry in the products table.
Now the goal is to create a query that essentially gives me an amount of all products across all warehouses for a given list of product names. The problem is, that some warehouses may not have a stock entry for a certain product. They also have either one stock per product or none.
I have managed to make it work for a single warehouse:
--join together all 3 product tables and select all desired products
WITH selIProducts AS(
SELECT
--Get the products category by checking if the table is part of the query
(CASE
WHEN toasters IS NOT NULL THEN 'toasters'
WHEN microwaves IS NOT NULL THEN 'microwaves'
ELSE 'ERROR'
END) as category,
products.name as productName,
*
FROM products
--I need a full join to include everything
FULL JOIN toasters ON toasters.name=products.name
FULL JOIN microwaves ON microwaves.name=products.name
WHERE
products.name IN (
'TOASTMASTER 3000',
'TOASTMASTER 3000Rev01',
'A3452 Ultra Microwave Oven',
)
),
warehouseStock AS
(
--only works with one inventory
SELECT * FROM STOCK
WHERE stock.warehouse='WH-1'
)
-- left join to ensure all item categories are included
SELECT COALESCE(warehouseStock.amount,0) as amount,* FROM selProducts
LEFT JOIN warehouseStock ON selIProducts.itemId=warehouseStock.item
It tried replacing WHERE stock.warehouse='WH-1' with WHERE stock.warehouse IN ('WH-1','WH-2') but that doesn't work since the desired product types are only joined once, instead of once per warehouse.
The final result should look like this:
Warehouse productName amount wattage slots category
WH-1 TOASTMASTER 3000 0 null 2 toasters
WH-1 TOASTMASTER 3000Rev01 1 null 3 toasters
WH-1 A3452 Ultra Microwave Oven 1 3000 null microwave
WH-2 TOASTMASTER 3000 2 null 2 toasters
WH-2 TOASTMASTER 3000Rev01 0 null 3 toasters
WH-2 A3452 Ultra Microwave Oven 0 3000 null microwave
I don't know how I am I should get postgres to return a null when there isn't a stock in a given warehouse.
Does anybody have any ideas?
You seem to want all products and all warehouses. That suggests a cross join to generate the rows:
SELECT v.warehouse, p.productname,
COALESCE(s.amount, 0) as amount
FROM selProducts p CROSS JOIN
(VALUES ('WH-1'), ('WH-2')) v(warehouse) LEFT JOIN
stock s
ON p.itemId = s.item AND v.warehouse = s.warehouse;
You might have another source for the warehouses, if you don't want to list them explicitly.
Add a table of warehouses wanted.
WITH selIProducts AS(
SELECT
--Get the products category by checking if the table is part of the query
(CASE
WHEN toasters IS NOT NULL THEN 'toasters'
WHEN microwaves IS NOT NULL THEN 'microwaves'
ELSE 'ERROR'
END) as category,
products.name as productName,
*
FROM products
--I need a full join to include everything
FULL JOIN toasters ON toasters.name=products.name
FULL JOIN microwaves ON microwaves.name=products.name
WHERE
products.name IN (
'TOASTMASTER 3000',
'TOASTMASTER 3000Rev01',
'A3452 Ultra Microwave Oven',
)
),
warehousesWanted AS
(
SELECT *
FROM Warehouse
WHERE name in ('WH-1', 'WH-2')
)
-- left join to ensure all item categories are included
SELECT COALESCE(warehouseStock.amount,0) as amount, *
FROM selIProducts sp
CROSS JOIN warehousesWanted ww
LEFT JOIN Stock ON Stock.itemId = sp.itemId
and ww.Name = Stock.Warehouse;
You may need to correct ON clause as I'm not sure what are proper column names of your real tables.

How to work left outer join in SQl Server?

First: I know to use all types of join but I don't know why it works like this for this Query
I have a Scenario for making a SQL Query, by using 3 tables and a left outer join between selling and order items.
My Tables:
--------------------
Item
--------------------
ID | Code
--------------------
1 | 7502
SQL > select * from Item where id = 1
---------------------
Item_Order
---------------------------
Item | Box | Quantity
---------------------------
1 | 30 | 15000
1 | 12 | 6000
SQL > select * from Item_Order where Item = 1
--------------------------
Invoice_Item
-------------------
Item | Num | Quantity
-------------------------
1 | 1.64 | 10
1 | 2.4 | 8
SQL > select * from Invoice_Item where Item = 1
I want this output:
Item | OrderQ | OrderB | SellN | SellQ
-----------------------------------------
1 | 1500 | 30 | 1.64 | 10
1 | 6000 | 12 | 2.4 | 8
My SQL code:
SELECT Item.ID, Item_Order.Box As OrderB, Item_Order.Quantity As OrderQ, Invoice_Item.Num As SellN, Invoice_Item.Quantity As SellQ
FROM Item LEFT OUTER JOIN
Invoice_Item ON Item.ID = Invoice_Item.Item LEFT OUTER JOIN
Item_Order ON Item_Order.Item = Item.ID
where Item.ID = 1
Why is my output 2x? or why does my output return 4 records?
Your result can be achieve with row_number:
select a.ID
, a.OrderB
, a.OrderQ
, b.Quantity SellQ
, b.Num SellN
from
(SELECT Item.ID
, Item_Order.Box As OrderB
, Item_Order.Quantity As OrderQ
, row_number () over (order by Item.ID) rn
FROM Item
left outer JOIN Item_Order ON Item.ID = Item_Order.Item) a
left outer join (select Item
, Num
, Quantity
, row_number () over (order by Item) rn
from Invoice_Item ) b
on a.ID = b.Item
and a.rn = b.rn
Here is a demo
You can add more tables like this:
left outer join (select Item
, Num
, Quantity
, row_number () over (order by Item) rn
from Invoice_Item ) b
Because when you first join Item with Item_Order it outputs two records because there are two records in Item_Order. Now this resulting query will be left join with Invoice_Item and that two records will be join with all of the records of Invoice_Item
You can better understand this like this
SELECT Item.ID, Item_Order.Box As OrderB, Item_Order.Quantity As OrderQ, Invoice_Item.Num As SellN, Invoice_Item.Quantity As SellQ
FROM Item LEFT OUTER JOIN
Invoice_Item ON Item.ID = Invoice_Item.Item LEFT OUTER JOIN
where Item.ID = 1 into table4 //Only to explain
Now the result of first query table4 will be joined with Items_Order
You are joining on one key -- two rows with the same key in one table times two rows in the second table = 4 rows.
You need a separate key. You can generate one using row_number():
SELECT i.ID, io.Box As OrderB, io.Quantity As OrderQ,
ii.Num As SellN, ii.Quantity As SellQ
FROM Item i LEFT OUTER JOIN
((SELECT ii.*,
ROW_NUMBER() OVER (PARTITION BY ii.item ORDER BY ii.item) as seqnum
FROM Invoice_Item ii
) FULL JOIN
(SELECT io.*,
ROW_NUMBER() OVER (PARTITION BY io.item ORDER BY io.item) as seqnum
FROM Item_Order io
) io
ON io.Item = ii.ID AND io.seqnum = ii.seqnum
)
ON i. = ii.Item
where i.ID = 1;
Note that this is one of the few cases where I use parentheses in the FROM clause. This code can handle additional rows in either of the tables -- if one table is longer than the other, the columns from the other will be NULL.
If you know the two tables have the same number of rows (for a given item) you can just use inner joins and no parentheses.
It is duplicating because you have no secondary association between Invoice_Item and Item_Order. For each record in Invoice_Item it is matching to Item_Order (known as a Cartesian result) base ONLY on the Item ID. So, your order qty APPEARS to be a 1:1 reference such that the first Invoice item Qty of 10 is MEANT to be associated with Item_Order Box = 30. and Qty 8 is MEANT to be associated with Item_Order Box = 12.
Item_Order
Item Box Quantity
1 30 15000
1 12 6000
Invoice_Item
Item Num Quantity
1 1.64 10
1 2.4 8
You probably need to tack on the "Box" reference so Item_Order and Invoice_Item are a 1:1 match.
What is happening is for each item in Invoice Item is joined to the Item_Order based on Item ID. So you are getting two. If you had 3 Invoice Items with 1 and 6 of Items_Order, you would be getting 18 rows.
FEEDBACK
Even though you have an accepted answer based on an OVER/PARTITION/ROW NUMBER, that process is forcing a surrogate secondary ID to each row. Relying on this approach is not best for an overall data structure association. What happens if you delete the second item on an order. are you positive you are deleting the second item in the invoice_items?
As for returning 2 records in the original scenario, you can via the surrogate process, but I think it would be better for you long term to understand what is happening on the join. Going back to your sample data of Item_Order and Invoice_Item. So lets start with the Item_Order table. The SQL engine is going to process each row individually.
First row SQL grabs Item = 1, Box = 30, Qty = 15000.
So now it joins to the Invoice Item table, and since your criteria it only joins based on Item. So, it sees the first row and says... yup this is item 1, so include that with the item order record (first row returned). Now it goes to the second line in the invoice item table... yup, it too is the same item 1, so it returns it again (second row returned).
Now, SQL grabs the second row Item = 1, Box = 12, Qty = 6000.
Goes back to the Invoice Item table and does exact same test... and for each row in the Item Order that has an Item = 1, and 3rd and 4th row hence your doubling... If either table had more records with the same Item id, it would return that many more records... 3 and 3 records would have returned 9 rows. 4 and 4 records would return 16 rows, etc. Doing the surrogate will work, but I don't think as safe as a better/updated design structure.

What is the best way to join tables

this is more like a general question.
I am looking for the best way to join 4, maybe 5 different tables. I am trying to create a Power Bi pulling live information from an IBM AS400 where customer service can type one of our parts number,
see how many parts we have in inventory, if none, see the lead time and if there are any orders already already entered for the typed part number.
SERI is our inventory table with 37180 records.
(active inventory that is available)
METHDM is our kit table with 37459 records.
(this table contains the bill of materials for custom kits, KIT A123 contains different part numbers in it witch are in SERI as well.)
STKA is our part lead time table with 76796 records.
(lead time means how long will it take for parts to come in)
OCRI is our sales order table with 6497 records.
(This table contains all customer orders)
I have some knowledge in writing queries but this one is more challenging of what I have created in the past. Should I start with the table that has the most records and start left joining the rest ?
From STKA 76796 records
Left join METHDM 37459 records on STKA
left join SERI 37180 records on STKA
left join OCRI 6497 records on STAK
Select
STKA.v6part as part,
STKA.v6plnt as plant,
STKA.v6tdys as pur_leadtime,
STKA.v6prpt as Pur_PrepLeadtime,
STKA.v6lead as Mfg_leadtime,
STKA.v6prpt as Mfg_PrepLeadTime,
METHDM.AQMTLP AS COMPONENT,
METHDM.AQQPPC AS QTYNEEDED,
SERI.HTLOTN AS BATCH,
SERI.HTUNIT AS UOM,
(HTQTY - HTQTYC) as ONHAND,
OCRI.DDORD# AS SALESORDER,
OCRI.DDRDAT AS PROMISED
from stka
left join METHDM on STKA.V6PART = METHDM.AQPART
left join SERI on STKA.V6PART = SERI.HTPART
left join OCRI on STKA.V6PART = OCRI.DDPART
Is this the best way to join the tables?
I think you already have your answer, but conceptually, there are a few issues here to deal with, and I figured I would give you a few examples, using data a little bit like yours, but massively simplified.
CREATE TABLE #STKA (V6PART INT, OTHER_DATA VARCHAR(50));
CREATE TABLE #METHDM (AQPART INT, KIT_ID INT, SOME_DATE DATETIME, OTHER_DATA VARCHAR(50));
CREATE TABLE #SERI (HTPART INT, OTHER_DATA VARCHAR(50));
CREATE TABLE #OCRI (DDPART INT, OTHER_DATA VARCHAR(50));
INSERT INTO #STKA SELECT 1, NULL UNION ALL SELECT 2, NULL UNION ALL SELECT 3, NULL; --1, 2, 3 Ids
INSERT INTO #METHDM SELECT 1, 1, '20200108 10:00', NULL UNION ALL SELECT 1, 2, '20200108 11:00', NULL UNION ALL SELECT 2, 1, '20200108 13:00', NULL; --1 Id appears twice, 2 Id once, no 3 Id
INSERT INTO #SERI SELECT 1, NULL UNION ALL SELECT 3, NULL; --1 and 3 Ids
INSERT INTO #OCRI SELECT 1, NULL UNION ALL SELECT 4, NULL; --1 and 4 Ids
So fundamentally we have a few issues here:
o the first problem is that the IDs in the tables differ, one table has an ID #4 but this isn't in any of the others;
o the second issue is that we have multiple rows for the same ID in one table;
o the third issue is that some tables are "missing" IDs that are in other tables, which you already covered by using LEFT JOINs, so I will ignore this.
--This will select ID 1 twice, 2 once, 3 once, and miss 4 completely
SELECT
*
FROM
#STKA
LEFT JOIN #METHDM ON #METHDM.AQPART = #STKA.V6PART
LEFT JOIN #SERI ON #SERI.HTPART = #STKA.V6PART
LEFT JOIN #OCRI ON #OCRI.DDPART = #STKA.V6PART;
So the problem here is that we don't have every ID in our "anchor" table STKA, and in fact there's no single table that has every ID in it. Now your data might be fine here, but if it isn't then you can simply add a step to find every ID, and use this as the anchor.
--This will select each ID, but still doubles up on ID 1
WITH Ids AS (
SELECT V6PART AS ID FROM #STKA
UNION
SELECT AQPART AS ID FROM #METHDM
UNION
SELECT HTPART AS ID FROM #SERI
UNION
SELECT DDPART AS ID FROM #OCRI)
SELECT
*
FROM
Ids I
LEFT JOIN #STKA ON #STKA.V6PART = I.Id
LEFT JOIN #METHDM ON #METHDM.AQPART = I.Id
LEFT JOIN #SERI ON #SERI.HTPART = I.Id
LEFT JOIN #OCRI ON #OCRI.DDPART = I.Id;
That's using a common-table expression, but a subquery would also do the job. However, this still leaves us with an issue where ID 1 appears twice in the list, because it has multiple rows in one of the sub-tables.
One way to fix this is to pick the row with the latest date, or any other ORDER you can apply to the data:
--Pick the best row for the table where it has multiple rows, now we get one row per ID
WITH Ids AS (
SELECT V6PART AS ID FROM #STKA
UNION
SELECT AQPART AS ID FROM #METHDM
UNION
SELECT HTPART AS ID FROM #SERI
UNION
SELECT DDPART AS ID FROM #OCRI),
BestMETHDM AS (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY AQPART ORDER BY SOME_DATE DESC) AS ORDER_ID
FROM
#METHDM)
SELECT
*
FROM
Ids I
LEFT JOIN #STKA ON #STKA.V6PART = I.Id
LEFT JOIN BestMETHDM ON BestMETHDM.AQPART = I.Id AND BestMETHDM.ORDER_ID = 1
LEFT JOIN #SERI ON #SERI.HTPART = I.Id
LEFT JOIN #OCRI ON #OCRI.DDPART = I.Id;
Of course you could also add some aggregation (SUM, MAX, MIN, AVG, etc.) to fix this problem (if it is indeed an issue). Also, I used a common-table expression, but this would work just as well with a subquery.
Expanding on a comment made on the question..
I would say I will start with SERI as that table contains the entire inventory for our facility and should cover the other tables
However the question said
SERI is our inventory table with 37180 records. (active inventory that is available)
In my experience, active inventory, isn't the same as all parts.
Normally, in a query like this, I'd expect the first table to be a Parts Master table of some sort that contains every possible part ID.

Two group by tables stich another table

I have 3 tables I need to put together.
The first table is my main transaction table where I need to get distinct transaction id numbers and company id. It has all the important keys. The transaction ids are not unique.
The second table has item info which is linked to transaction id numbers which are not unique and I need to pull items.
The third table has company info which has company id.
Now I've sold some of these with the first one through a group by id. The second through a subquery which creates unique ids and joins onto the first one.
The issue I'm having is the third one by company. I cannot seem to create a query that works in the above combinations. Any ideas?
As suggested here is my code. It works but that's because for the company I used count which doesn't give the correct number. How else can I get the company number to come out correct?
SELECT
dep.ItemIDAPK,
dep.TotalOne,
dep.company,
company.vendname,
appd.ItemIDAPK,
appd.ItemName
FROM (
SELECT
csi.ItemIDAPK,
sum(f.TotalOne) as TotalOne,
count(f.DimCurrentcompanyID) company
FROM dbo.ReportOne F with (nolock)
INNER JOIN dbo.DSaleItem csi with (nolock)
on f.DSaleItemID = csi.DSaleItemID
INNER JOIN dbo.DimCurrentcompany cv
ON f.DimCurrentcompanyID = cv.DimCurrentcompanyID
INNER JOIN dbo.DimDate dat
on f.DimDateID = dat.DimDateID
where (
dat.date >='2013-01-29 00:00:00.000'
and dat.date <= '2013-01-30 00:00:00.000'
)
GROUP BY csi.ItemIDAPK
) as dep
INNER JOIN (
SELECT
vend.DimCurrentcompanyID,
vend.Name vendname
FROM dbo.DimCurrentcompany vend
) As company
on dep.company = company.DimCurrentcompanyID
INNER JOIN (
SELECT
c2.ItemIDAPK,
ItemName
FROM (
SELECT DISTINCT ItemIDAPK
FROM dbo.dimitem AS C
) AS c1
JOIN dbo.dimitem AS c2 ON c1.ItemIDAPK = c2.ItemIDAPK
) as appd
ON dep.ItemIDAPK = appd.ItemIDAPK
For further information my output is the following example, I know the code executes and the companyid is incorrect as I just put it with a (count) in their to make the above code execute:
Current Results:
Item Number TLS CompanyID Company Name Item Number Item Name
111111 300 303 Johnson Corp 29323 Soap
Proposed Results:
Item Number TLS CompanyID Company Name Item Number Item Name
111111 300 29 Johnson Corp 29323 Soap

How to ensure outer join with filter still returns all desired rows?

Imagine I have two tables in a DB like so:
products:
product_id name
----------------
1 Hat
2 Gloves
3 Shoes
sales:
product_id store_id sales
----------------------------
1 1 20
2 2 10
Now I want to do a query to list ALL products, and their sales, for store_id = 1. My first crack at it would be to use a left join, and filter to the store_id I want, or a null store_id, in case the product didn't get any sales at store_id = 1, since I want all the products listed:
SELECT name, coalesce(sales, 0)
FROM products p
LEFT JOIN sales s ON p.product_id = s.product_id
WHERE store_id = 1 or store_id is null;
Of course, this doesn't work as intended, instead I get:
name sales
---------------
Hat 20
Shoes 0
No Gloves! This is because Gloves did get sales, just not at store_id = 1, so the WHERE clause has filtered them out.
How then can I get a list of ALL products and their sales for a specific store?
Here are some queries to create the test tables:
create temp table test_products as
select 1 as product_id, 'Hat' as name;
insert into test_products values (2, 'Gloves');
insert into test_products values (3, 'Shoes');
create temp table test_sales as
select 1 as product_id, 1 as store_id, 20 as sales;
insert into test_sales values (2, 2, 10);
UPDATE: I should note that I am aware of this solution:
SELECT name, case when store_id = 1 then sales else 0 end as sales
FROM test_products p
LEFT JOIN test_sales s ON p.product_id = s.product_id;
however, it is not ideal... in reality I need to create this query for a BI tool in such a way that the tool can simply add a where clause to the query and get the desired results. Inserting the required store_id into the correct place in this query is not supported by this tool. So I'm looking for other options, if there are any.
Add the WHERE condition to the LEFT JOIN clause to prevent that rows go missing.
SELECT p.name, coalesce(s.sales, 0)
FROM products p
LEFT JOIN sales s ON p.product_id = s.product_id
AND s.store_id = 1;
Edit for additional request:
I assume you can manipulate the SELECT items? Then this should do the job:
SELECT p.name
,CASE WHEN s.store_id = 1 THEN coalesce(s.sales, 0) ELSE NULL END AS sales
FROM products p
LEFT JOIN sales s USING (product_id)
Also simplified the join syntax in this case.
I'm not near SQL, but give this a shot:
SELECT name, coalesce(sales, 0)
FROM products p
LEFT JOIN sales s ON p.product_id = s.product_id AND store_id = 1
You don't want a where on the whole query, just on your join