Self join and inner join to remove duplicates

Self join and inner join to remove duplicates - sql

I am stuck on this and I am relatively new to SQL.
Here is the question we were given:
List the productname and vendorid for all products that we have
purchased from more than one vendor (Hint: you’ll need a Self-Join and
an additional INNER JOIN to solve, don't forget to remove any
duplicates!!)
Here is a screenshot of tables we are working with:
Here is what I have.....I know it is wrong. It works to a degree, just not exactly how the prof wants it.
SELECT DISTINCT productname, product_vendors.vendorid
FROM products INNER JOIN Product_Vendors
ON products.PRODUCTNUMBER = PRODUCT_VENDORS.PRODUCTNUMBER
INNER JOIN vendors ON Product_Vendors.VENDORID = vendors.VENDORID
ORDER BY products.PRODUCTNAME;
Expected output provided the prof:

I agree with #jarlh that additional information would be helpful- i.e. are there triplicates in the data or just duplicates, etc.
That said, this should get your started
SELECT
c.productname AS 'Product'
,a.vendorid AS 'Vendor1'
,b.vendorid AS 'Vendor2'
FROM
product_vendors AS a
JOIN
product_vendors AS b
ON
a.productnumber = b.productnumber
AND a.vendorid <> b.vendorid
JOIN
dbo.products AS c
ON
a.productnumber = c.productnumber
This will limit the population of 'Product Vendors' just to products with unmatching vendors.
From there you are joining to products to pull back product name.
Also- work on coding format, clean code makes the dream work :)

The solution to this problem is usually to count vendors per product with COUNT OVER and only stick with products with more than one. Simply:
select productname, vendorid
from
(
select
p.productname,
pv.vendorid,
count(*) over (partition by product) as cnt
from products p
join product_vendors pv using (productnumber)
)
where cnt > 1;
If this shall be done without window functions, then one option is to aggregate product_vendors and use this result:
select p.productname, pv.vendorid
from
(
select productid
from product_vendors
group by productname
having count(*) > 1
) px
join products p using (productid)
join product_vendors pv using (productid);
or check whether exists another vendor for the product:
select
p.productname,
pv.vendorid,
count(*) over (partition by product) as cnt
from products p
join product_vendors pv on pv.productnumber = p.productnumber
where exists
(
select *
from product_vendors other
where other.productnumber = pv.productnumber
and other.vendorid <> pv.vendorid
);
In neither of these approaches I see the need to eliminate duplicates, as there should be one row per product in products and one row per product and vendor in product_vendors. So I guess what your prof was thinking of is:
select distinct
p.productname,
pv.vendorid
from products p
join product_vendors pv on pv.productnumber = p.productnumber
join product_vendors other on other.productnumber = pv.productnumber
and other.vendorid <> pv.vendorid
This, however, is an approach I don't recommend. You'd combine all vendors for a product (e.g. with 10 vendors for one product you already have 45 combinations for that product only, if I'm not mistaken). So you'd create a large intermediate result only to dismiss most of it with DISTINCT later. Don't do that. Remember: SELECT DISTINCT is often an indicator for a poorly written query (i.e. unnecessary joins leading to too many combinations you are not actually interested in).

SELECT DISTINCT p.name AS product, v.id
FROM products p
INNER JOIN product_vendors pv ON p.id = pv.productid
INNER JOIN product_vendors pv2 ON pv.productid = pv2.productid AND pv.vendorid != pv2.vendorid
INNER JOIN vendors v ON v.id = pv.vendorid
ORDER BY p.name

Related

Row is listed twice in simple query

One row is Repeated twice and I can't seem to figure out why. I tried using Group by but couldn't figure that out either lol. Using Left outer Join, to list suppliers who have a discounted product, in the Northwind Database
Select *
From Suppliers s
Left Outer Join products p
On s.SupplierID = p.SupplierID
Where p.Discontinued = 1

You have two discontinued products for a supplier, and a row is created for each row in products that matches the join condition and the Discontinued = 1 predicate. You want something like that:
SELECT * FROM Suppliers s
WHERE EXISTS (SELECT 1
FROM Products p
WHERE p.SupplierID = s.SupplierID
AND p.Discontinued = 1)

You have two rows in one of those tables. You can determine which by querying both tables by themselves on that supplierId.
Now, to your query, by putting p.discontinued in the where, that join effectively becomes an inner join, so you should either flip it to an inner join or move that condition to the join.
To get suppliers with discontinued products, you can do this:
Select * from supplier where supplierId in (
select supplierId from products
where discontinued =1)

There is clearly a supplier that has multiple discontinued products.
If you want suppliers with at least one discounted product, then use exists:
select s.*
from suppliers s
where exists (select 1
from products p
where p.supplierid = s.supplierid and
p.Discontinued = 1
);
If you want the list of suppliers with the number of discontinued products, use join:
select s.*, p.num_discontinued
from supplier s join
(select p.supplierid, count(*) as num_discontinued
from products
where p.Discontinued = 1
group by p.supplierid
) p
on p.SupplierID = s.SupplierID ;
If you want the list of products that are discontinued with their suppliers, than use your query but change the left join to an inner join. An outer join is not necessary.

SQL strategy to fetch maximum

Suppose I have these three tables:
I want to get, for all products, it's product_id and the client that bougth it most times (the biggest client of the product).
I solved it like this:
SELECT
product_id AS product,
(SELECT TOP 1 client_id FROM Bill_Item, Bill
WHERE Bill_Item.product_id = p.product_id
and Bill_Item.bill_id = Bill.bill_id
GROUP BY
client_id
ORDER BY
COUNT(*) DESC
) AS client
FROM Product p
Do you know a better way?

the inner query will give you the ranking. The outer query will give you the client that puchase the most for a product
SELECT *
(
SELECT i.product_id, b.client_id,
r = row_number() over (partition by i.product_id
order by count(*) desc)
FROM Bill b
INNER JOIN Bill_Item i ON b.bill_id = i.bill_id
GROUP BY i.product_id, b.client_id
) d
WHERE r = 1

I was going to submit pretty much the same thing as #Squirrell only with a Common Table Expression [CTE] rather than a derived table. So I wont duplicate that but there are some learning points concerning your query. First is IMPLICIT JOINS such as FROM Bill_Item, Bill are really easy to have uintended consequences (one of many questions: Queries that implicit SQL joins can't do?) Next for the Calculated column you can actually do this in a OUTER APPLY or CROSS APPLY which is a very useful technique.
So you could re-write your method as follows:
SELECT *
FROM
Product p
OUTER APPLY (SELECT TOP 1 b.client_id
FROM
Bill_Item bi
INNER JOIN Bill b
ON bi.bill_id = b.bill_id
WHERE
bi.product_id = p.product_id
GROUP BY
b.client_id
ORDER BY
COUNT(*) DESC) c
And to show you how squirell's answer can still include products that have never been sold all you need to do is join Products and LEFT JOIN to other tables:
;WITH cte AS (
SELECT
p.product_id
,b.client_id
,ROW_NUMBER() OVER (PARTITION BY p.product_id ORDER BY COUNT(*) DESC) as RowNumber
FROM
Product p
LEFT JOIN Bill_Item bi
ON p.product_id = bi.product_id
LEFT JOIN Bill b
ON bi.bill_id = b.bill_id
GROUP BY
p.product_id
,b.client_id
)
SELECT *
FROM
cte
WHERE
RowNumber = 1
Techniques used in some of these that are useful.
CTE
APPLY (Outer & Cross)
Window Functions

Squirrel's answer doesn't return products that have never been sold. If you want to include those, then your approach is ok, although I would write the query as:
SELECT product_id as product,
(SELECT TOP 1 b.client_id
FROM Bill_Item bi JOIN
Bill b
ON bi.bill_id = b.bill_id
WHERE Bill_Item.product_id = p.product_id
GROUP BY client_id
ORDER BY COUNT(*) DESC
) as client
FROM Product p;
You can also express this using APPLY, but a correlated subquery is also fine.
Note the correct use of the explicit JOIN syntax.

Combine Two Tables in Select (SQL Server 2008)

If I have two tables, like this for example:
Table 1 (products)
id
name
price
agentid
Table 2 (agent)
userid
name
email
How do I get a result set from products that include the agents name and email, meaning that products.agentid = agent.userid?
How do I join for example SELECT WHERE price < 100?

Edited to support price filter
You can use the INNER JOIN clause to join those tables. It is done this way:
select p.id, p.name as ProductName, a.userid, a.name as AgentName
from products p
inner join agents a on a.userid = p.agentid
where p.price < 100
Another way to do this is by a WHERE clause:
select p.id, p.name as ProductName, a.userid, a.name as AgentName
from products p, agents a
where a.userid = p.agentid and p.price < 100
Note in the second case you are making a natural product of all rows from both tables and then filtering the result. In the first case you are directly filtering the result while joining in the same step. The DBMS will understand your intentions (regardless of the way you choose to solve this) and handle it in the fastest way.

This is a very rudimentary INNER JOIN:
SELECT
products.name AS productname,
price,
agent.name AS agentname
email
FROM
products
INNER JOIN agent ON products.agentid = agent.userid
I recommend reviewing basic JOIN syntax and concepts. Here's a link to Microsoft's documentation, though what you have above is pretty universal as standard SQL.
Note that the INNER JOIN here assumes every product has an associated agentid that isn't NULL. If there are NULL agentid in products, use LEFT OUTER JOIN instead to return even the products with no agent.

select p.name productname, p.price, a.name as agent_name, a.email
from products p
inner join agent a on (a.userid = p.agentid)

This is my join for slightly larger tables in Prod.Hope it helps.
SELECT TOP 1000 p.[id]
,p.[attributeId]
,p.[name] as PropertyName
,p.[description]
,p.[active],
a.[appId],
a.[activityId],
a.[Name] as AttributeName
FROM [XYZ.Gamification.V2B13.Full].[dbo].[ADM_attributeProperty] p
Inner join [XYZ.Gamification.V2B13.Full].[dbo].[ADM_activityAttribute] a
on a.id=p.attributeId
where a.appId=23098;

select ProductName=p.[name]
, ProductPrice=p.price
, AgentName=a.[name]
, AgentEmail=a.email
from products p
inner join agent a on a.userid=p.agentid

If you don't want to use inner join (or don't have possibility to do it!) and would combine rows, you can use a cross join :
SELECT *
FROM table1
CROSS JOIN table2
or simply
SELECT *
FROM table1, table2

Using sum with a nested select

I'm using SQL Server. This statement lists my products per menu:
SELECT menuname, productname
FROM [web].[dbo].[tblMenus]
FULL OUTER JOIN [web].[dbo].[tblProductsRelMenus]
ON [tblMenus].Id = [tblProductsRelMenus].MenuId
FULL OUTER JOIN [web].[dbo].[tblProducts]
ON [tblProductsRelMenus].ProductId = [tblProducts].ProductId
LEFT JOIN [web].[dbo].[tblOrderDetails]
ON ([tblProducts].Id = [tblOrderDetails].ProductId)
GROUP BY [tblProducts].ProductName
Some products don't have menus and vice versa. I use the following to establish what has been sold of each product.
SELECT [tblProducts].ProductName, SUM([tblOrderDetails].Ammount) as amount
FROM [web].[dbo].[tblProducts]
LEFT JOIN [web].[dbo].[tblOrderDetails]
ON ([tblProducts].ProductId = [tblOrderDetails].ProductId)
GROUP BY [tblProducts].ProductName
What I want to do is complement the top table with an amount column. That is, I want a table with the same number of rows as in the first table above but with an amount value if it exists, otherwise null.
I can't figure out how to do this. Any suggestions?

If I am not missing anything, the second query could be simplified, then incorporated into the first query like this:
SELECT
m.menuname,
p.productname,
t.amount
FROM [web].[dbo].[tblMenus] m
FULL JOIN [web].[dbo].[tblProductsRelMenus] pm ON m.Id = pm.MenuId
FULL JOIN [web].[dbo].[tblProducts] p ON pm.ProductId = p.ProductId
LEFT JOIN (
SELECT ProductId, SUM(Amount) as amount
FROM [web].[dbo].[tblOrderDetails]
GROUP BY ProductId
) t ON p.ProducId = t.ProductId

Left Join returns more records

I have 2 tables with related data.
one table is for products. and the other price. In price table one product may appear several times.
How can I return the result showing the products without containing duplicate rows
My Query is
select p.Product, sum(p.Qty), max(pr.netprice)
from Products p
left outer join Price pr
on p.Product=pr.Product
where p.brand=''
group by p.Product,pr.Product
but return more rows as right table have multiple records
please help

I don't think distinct is the way to go, I think group by should result in what you want if done correctly. Also, I don't think you need to group on values from both tables.. You should really understand what you want to. Give us example data and it will be easier to answer your question. Try this:
select p.Product, sum(p.Qty), max(pr.netprice)
from Products p
left outer join Price pr on p.Product = pr.Product
where p.brand = ''
group by p.Product -- only group on param.

Use the distinct keyword. That will remove duplicates. ALthough, if there are different prices for a given product, there will be one record per unique price per product if you remove the Max().
select DISTINCT p.Product, sum(p.Qty),max(pr.netprice)
from Products p
left outer join Price pr on p.Product=pr.Product
where p.brand='' group by p.Product,pr.Product

Try putting distinct in select. I am not sure it will work, but try it.

If you want the sum then use Tomas answer above. This will give you a unique product list with the total quantity and maximum price for each product
select p.Product, sum(p.Qty), max(pr.netprice)
from Products p
left outer join Price pr on p.Product = pr.Product
where p.brand = ''
group by p.Product

What about changing it this way:
SELECT p.Product, sum(p.Qty),
(SELECT max(pr.netprice)
FROM Price pr
WHERE p.Product=pr.Product
)
FROM Products p
WHERE p.brand=''

Try this
SELECT p.Product, p.Qty, MAX(pr.netprice)
FROM Products p
LEFT OUTER JOIN Price pr ON p.Product=pr.Product
WHERE p.brand=''
GROUP BY p.Product, p.Qty

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Self join and inner join to remove duplicates - sql

SELECT DISTINCT p.name AS product, v.id FROM products p INNER JOIN product_vendors pv ON p.id = pv.productid INNER JOIN product_vendors pv2 ON pv.productid = pv2.productid AND pv.vendorid != pv2.vendorid INNER JOIN vendors v ON v.id = pv.vendorid ORDER BY p.name

Related

Row is listed twice in simple query

SQL strategy to fetch maximum

Combine Two Tables in Select (SQL Server 2008)

Using sum with a nested select

Left Join returns more records

Categories

Resources