Left Outer Join Without Duplicate Rows - sql

After a lot of searching I could not solve my problem. I have the following tables:
I want to select all records from my 'product' table. but I have a problem. I got multiple rows from 'product' table when I execute the following query:
SELECT dbo.product.id, dbo.product.name, dbo.product_price.value
dbo.product_barcode.barcode
FROM dbo.product LEFT OUTER JOIN
dbo.product_price ON dbo.product.id = dbo.product_price.product_id LEFT OUTER JOIN
dbo.product_barcode ON dbo.product.id = dbo.product_barcode.product_id
My problem was solved with the following query:
SELECT dbo.product.id, dbo.product.name, dbo.product_price.value
dbo.product_barcode.barcode
FROM dbo.product LEFT OUTER JOIN
dbo.product_price ON dbo.product.id = dbo.product_price.product_id LEFT OUTER JOIN
dbo.product_barcode ON dbo.product.id = dbo.product_barcode.product_id
WHERE (dbo.product_price.id IN
(SELECT MIN(id) AS minPriceID
FROM dbo.product_price AS product_price_1
GROUP BY product_id)) AND (dbo.product_barcode.id IN
(SELECT MIN(id) AS Expr1
FROM dbo.product_barcode AS product_barcode_1
GROUP BY product_id))
Now I have just one problem. if the 'product_price' table or 'product_barcode' table does not have any record, No records will be returned. I mean if no similar record is found in 'product_price' or 'product_barcode' table, we will not have any record. while we should have records from the 'product' table with null columns of other tables.
Please help me Thanks.

I think the problem is that the conditions are in the WHERE clause - try moving the conditions from the WHERE clause and into each join:
SELECT dbo.product.id, dbo.product.name, dbo.product_price.value,
dbo.product_barcode.barcode
FROM dbo.product
LEFT OUTER JOIN dbo.product_price ON dbo.product.id = dbo.product_price.product_id
--moved from WHERE clause
AND dbo.product_price.id IN (SELECT MIN(id) AS minPriceID FROM dbo.product_price AS product_price_1 GROUP BY product_id)
LEFT OUTER JOIN dbo.product_barcode ON dbo.product.id = dbo.product_barcode.product_id
--moved from WHERE clause
AND dbo.product_barcode.id IN (SELECT MIN(id) AS Expr1 FROM dbo.product_barcode AS product_barcode_1 GROUP BY product_id)

SELECT dbo.product.id, dbo.product.name, MIN(dbo.product_price.value),MIN( dbo.product_barcode.barcode)
FROM dbo.product
LEFT OUTER JOIN dbo.product_price ON dbo.product.id = dbo.product_price.product_id
LEFT OUTER JOIN dbo.product_barcode ON dbo.product.id = dbo.product_barcode.product_id
GROUP BY dbo.product.id, dbo.product.name
While the above query should achieve your desired results - one entry for product with the minimun product_price (if one exist) and minimum product_barcode (if one exist).
I only assumed you wanted this based on the query you were writing. You need to spend more time thinking about the question you are trying to answer.
Joining will multiply your results if more than one entry per join key is present in one of the tables.

Just remember this diagram:
When you are calling LEFT JOIN and applying filtering like in the second picture from the top left, you will get only items belonging to the A table.
I want to select all records from my 'product' table. but I have a problem. I got multiple rows from 'product' table when I execute the following query:
Can you try calling DISTINCT after SELECT, like this:
SELECT DISTINCT dbo.product.id, dbo.product.name, dbo.product_price.value
dbo.product_barcode.barcode
...

Related

Select and join returning duplicate data

I have some tables that can be accessed here and I would like to get a new table with EntryId from Entry table and ProtocolNumber from JudicialOrder table. For that I'm using this query:
SELECT DISTINCT ET.EntryId, JOA.ProtocolNumber FROM Entry AS ET
LEFT JOIN JudicialOrderAccount AS JOT ON JOT.AccountId = ET.OwnerAccountId
INNER JOIN JudicialOrder AS JOA ON JOA.JudicialOrderId = JOT.JudicialOrderId;
But the ProtocolNumber is duplicated, what could be wrong with my query?
As Kurt said only the combination of ET.EntryId, JOA.ProtocolNumber is unique. You will recognize it if you add an order by.
SELECT DISTINCT ET.EntryId, JOA.ProtocolNumber FROM Entry AS ET
LEFT JOIN JudicialOrderAccount AS JOT ON JOT.AccountId = ET.OwnerAccountId
INNER JOIN JudicialOrder AS JOA ON JOA.JudicialOrderId = JOT.JudicialOrderId
ORDER BY ET.EntryId, JOA.ProtocolNumber;
If you would really like to have unique protocol number you would need to group by ProtocolNumber and wrap EntryId in some string_agg function (depends on your database).
FYI: Your LEFT JOIN - INNER JOIN combination ends up being two INNER JOINs, see

SQL query with 2 counts and 2 left outer joins

Im trying to show all columns from my t1_elem table and join 2 columns in which I use COUNT.
I used query:
SELECT p.*,COUNT(t4_id) as ile_publikacji, COUNT(t7_id) as ile_fitow
FROM t1_elem p
LEFT OUTER JOIN t4_autorzy ON p.t1_id=t4_autorzy.t4_t1_id
LEFT JOIN t7_pliki ON p.t1_id=t7_pliki.t7_t1_id
GROUP BY t1_id
But the results are bad. What I'm doing wrong?
Probably you have multiple matches. As stated, the two counts will be the same. The simplest solution is probably to use distinct:
SELECT p.*, COUNT(DISTINCT t4_id) as ile_publikacji, COUNT(DISTINCT t7_id) as ile_fitow
FROM t1_elem p LEFT JOIn
t4_autorzy
ON p.t1_id = t4_autorzy.t4_t1_id LEFT JOIN
t7_pliki
ON p.t1_id=t7_pliki.t7_t1_id
GROUP BY t1_id

Why is LEFT JOIN deleting rows?

I have been using sql for a long time, but I am now working in Databricks and I am getting a very strange result. I have a table called block_durations with a set of ids (called block_ts), and I have another table called mergetable, which I want to left join to that table. Mergetable is indexed by acct_id and block_ts, so it has many different records for each block_ts. I want to keep the rows in block_durations that don't match, and if there are multiple matches in mergetable I want there to be multiple corresponding entries in the resulting join, as you would expect from a left join.
But this is not happening. In order to demonstrate this, I am showing the result of joining mergetable, after filtering for a single acct_id so that there is at most one match per block_ts.
select count(*) from mergetable where acct_id = '0xfbb1b73c4f0bda4f67dca266ce6ef42f520fbb98'
16579
select count(*) from block_durations
82817
select count(*) from
(
SELECT
mt.*,
bd.block_duration
FROM
block_durations bd
left outer JOIN mergetable mt
ON mt.block_ts = bd.block_ts
where acct_id='0xfbb1b73c4f0bda4f67dca266ce6ef42f520fbb98'
) countTable
16579
As you can see, even though there are >80000 records in block_durations, most of them are getting lost in the left join. Why is this happening? I thought the whole point of a left join is that the non-matching rows of the left table are kept. This is exactly the behavior I would expect from an inner join -- and indeed when I switch to an inner join nothing changes.
Could someone please help me figure out what's going on?
-Paul
All rows from left side of the join are preserved, but later on you run WHERE ... condition on that which removed rows not matching the condition.
Merge your WHERE condition into JOIN condition:
SELECT
mt.*,
bd.block_duration
FROM
block_durations bd
left outer JOIN mergetable mt
ON mt.block_ts = bd.block_ts AND acct_id='0xfbb1b73c4f0bda4f67dca266ce6ef42f520fbb98'
You can also filter mergetable before you run JOIN on the results:
SELECT
mt.*,
bd.block_duration
FROM
block_durations bd
left outer JOIN (SELECT * FROM mergetable WHERE acct_id='0xfbb1b73c4f0bda4f67dca266ce6ef42f520fbb98') mt
ON mt.block_ts = bd.block_ts

T-SQL Left-Join with 1 row (limi, subselect)

I already read a lot on that topic but I´m unable to get it to work for my case.
I have the following situation:
A list of orderitems (the main datasets I want to get)
Articles which have a 1:1 relation to an order item
A n:m Jointable "Articlesupplier" which creates a relation between an article and a
partner
A Partner table with detailed information about partners.
Target:
One dataset per OrderItem and from the suppliers I only want to get the first one found in the join. No priorization required.
Tables:
Table IDX_ORDERITEM
id,article_id
Table IDX_ARTICLE
id,name
Table IDX_ARTICLESUPPLIER
article_id,partner_id
Table IDX_PARTNER
id,abbr
My actual statement (short version):
SELECT IDX_ORDERITEM.id
FROM
dbo.IDX_ORDERITEM AS IDX_ORDERITEM
-- ARTICLE --
INNER JOIN dbo.IDX_ARTICLE AS IDX_ARTICLE
ON IDX_ORDERITEM.article_id=IDX_ARTICLE.id
-- SUPPLIER VIA ARTICLE --
LEFT JOIN
(SELECT TOP(1) IDX_PARTNER.id, IDX_PARTNER.abbr
FROM IDX_PARTNER, IDX_ARTICLESUPPLIER
WHERE IDX_PARTNER.id = IDX_ARTICLESUPPLIER.partner_id
AND IDX_ARTICLESUPPLIER.article_id=IDX_ARTICLE.id) AS IDX_PARTNER_SUPPLIER
ON IDX_PARTNER_SUPPLIER.id=IDX_ARTICLE.supplier_partner_id
WHERE 1>0
ORDER BY orderitem.id DESC
But it seems I can´t access IDX_ARTICLE.id in the subquery. I get the following error message:
The multi-part identifier "IDX_ARTICLE.id" could not be bound.
Is the problem that the Article alias has the same name as the table name?
Thanks a lot in advance for possible ideas,
Mike
Well, I changed your aliases, and the subquery to which you were joining (I also modified that subquery so it doesn't use implicit joins anymore), though this changes where mostly cosmetics. The actual important change was the use of OUTER APPLY instead of LEFT JOIN:
SELECT OI.id
FROM dbo.IDX_ORDERITEM AS OI
INNER JOIN dbo.IDX_ARTICLE AS A
ON OI.article_id = A.id
OUTER APPLY
(SELECT TOP(1) P.id, P.abbr
FROM IDX_PARTNER AS P
INNER JOIN IDX_ARTICLESUPPLIER AS SUP
ON P.id = SUP.partner_id
WHERE SUP.article_id = A.id
AND P.id = A.supplier_partner_id) AS PS
ORDER BY OI.id DESC
The error is thrown because the below piece of query
(SELECT TOP(1) IDX_PARTNER.id, IDX_PARTNER.abbr
FROM IDX_PARTNER, IDX_ARTICLESUPPLIER
WHERE IDX_PARTNER.id = IDX_ARTICLESUPPLIER.partner_id
AND IDX_ARTICLESUPPLIER.article_id=IDX_ARTICLE.id) AS IDX_PARTNER_SUPPLIER
cannot be considered as a correlated sub-query and IDX_ARTICLE.id is referenced in it in the same manner we reference a field of outer query in a correlated sub-query.
I see two problems.
According to your DDLs there is no IDX_ARTICLE.supplier_partner_id which you refer to in the left join on clause.
Second, I'm quite sure you cannot use IDX_ARTICLE.id in your derived table. Simply add IDX_ARTICLESUPPLIER.article_id to your derived table selected fields and use it in your left join on clause against IDX_ARTICLE.id.
I prefer to avoid nested queries. If I can, I will always rewrite it using CTE.
WITH Part_Sup
AS (
SELECT TOP ( 1 ) P.id
,P.abbr
,SUP.article_id
FROM IDX_PARTNER AS P
INNER JOIN IDX_ARTICLESUPPLIER AS SUP
ON P.id = SUP.partner_id
)
SELECT OI.id
FROM dbo.IDX_ORDERITEM AS OI
INNER JOIN dbo.IDX_ARTICLE AS A
ON OI.article_id = A.id
LEFT OUTER JOIN Part_Sup AS PS
ON PS.article_id = A.Id
AND PS.id = A.supplier_partner_id
ORDER BY OI.id DESC;
Next I rewritten the first query to use ROW_NUMBER() function instead of using TOP (1) using ROW_NUMBER you can control which results you want and what you don't want.
WITH Part_Sup
AS (
SELECT P.id
,P.abbr
,SUP.article_id
,ROW_NUMBER() OVER ( PARTITION BY P.id, P.abbr ) AS RowNum
FROM IDX_PARTNER AS P
INNER JOIN IDX_ARTICLESUPPLIER AS SUP
ON P.id = SUP.partner_id
)
SELECT OI.id
FROM dbo.IDX_ORDERITEM AS OI
INNER JOIN dbo.IDX_ARTICLE AS A
ON OI.article_id = A.id
LEFT OUTER JOIN Part_Sup AS PS
ON PS.article_id = A.Id
AND PS.id = A.supplier_partner_id
AND RowNum = 1
ORDER BY OI.id DESC;
Thanks Lamak - you solved it :)
I used your input to extract the basic solution to make it a bit easier to read for others which have the same problem:
Using OUTER APPLY (without ORDER_ITEM Table here):
SELECT IDX_ARTICLE.id AS AR_ID, IDX_PARTNER_SUPPLIER.id, IDX_PARTNER_SUPPLIER.abbr
FROM
dbo.IDX_ARTICLE AS IDX_ARTICLE
OUTER APPLY
(SELECT TOP(1) _PARTNER.id, _PARTNER.abbr
FROM IDX_PARTNER AS _PARTNER
INNER JOIN IDX_ARTICLESUPPLIER AS _ARTICLESUPPLIER
ON _PARTNER.id = _ARTICLESUPPLIER.partner_id
WHERE _ARTICLESUPPLIER.article_id=IDX_ARTICLE.id
AND _ARTICLESUPPLIER.deleted IS NULL) AS IDX_PARTNER_SUPPLIER
WHERE IDX_ARTICLE.id=67

Creating a nested query to filter from the inital query

I am attempting to create a nested query so that the query select all ordernumbers and orderid from the invoice items and then selects the items that were sub rented.
select
do.orderid, do.orderno, ot.masteritemid, ot.qty
from dealorder do
inner join ordertran ot on do.orderid=ot.orderid and ot.orderid='A00M5BGA'
where ot.vendorid<>''
Select orderno, orderid from invoiceitemview where invoiceno='T646692'
I have tried an inner join but it does not seem to work. The first query gives me 6 items which is correct, however if I perform the join, it seems to be getting items that do not belong to the order. Hence, How would I create a nested query to get all items from the second query and then filter using the first query.
This sounds like what you are looking for
select
do.orderid, do.orderno, ot.masteritemid, ot.qty
from dealorder do
inner join ordertran ot on do.orderid=ot.orderid and ot.orderid='A00M5BGA'
inner join (
Select orderno, orderid from invoiceitemview where invoiceno='T646692'
) tmp ON tmp.orderno=do.orderno AND tmp.orderid=do.orderid
where ot.vendorid<>''
Try this.
select do.orderid, do.orderno, ot.masteritemid, ot.qty
from (Select orderno, orderid from invoiceitemview where invoiceno='T646692') inv
inner join dealorder do inv.orderid=do.orderid
inner join ordertran ot on do.orderid=ot.orderid and ot.orderid='A00M5BGA'
where ot.vendorid<>''