SQL Server query for related products - sql

I am trying to get related products but the issue which I'm facing is that there is product photos table which has one-to-many relationship with products table, so when I get products by matching category Id it also returns multiple product photos with that product which i do not want. I want only one product photo from product photos table of specific product. Is there any way to use distinct in joins or any other way? what I have done so far....
SELECT [Product].[ID],
,[Thumbnail]
,[ProductName]
,[Model]
,[SKU]
,[Price]
,[IsExclusive]
,[DiscountPercentage]
,[DiscountFixed]
,[NetPrice]
,[Url]
FROM [dbo].[Product]
INNER JOIN [ProductPhotos] ON [ProductPhotos].[ProductID]=[Product].[ID]
INNER JOIN [ProductCategories] ON [ProductCategories].[ProductID]=
[Product].[ID]
WHERE [ProductCategories].[CategoryID]=4
And the result I am getting is...
Product Photos table has
Is there any way to use distinct or group by on product Id column in product photos table to return only one row from photos table.

Instead of using inner join, use cross apply:
SELECT . . .
FROM dbo.Product p CROSS APPLY
(SELECT TOP (1) pp.*
FROM ProductPhotos pp
WHERE pp.ProductID = p.id
ORDER BY NEW_ID()
) pp INNER JOIN
ProductCategories pc
ON pc.ProductID = p.id
WHERE pc.CategoryID = 4;
Notes:
The ORDER BY NEWID() chooses a random photo. You can order by specific columns to get the earliest, latest, biggest, or whatever.
Note that I added table aliases. These make the query easier to write and to read.
You should qualify all column names in your query, so it is clear which tables they come from.
I removed the superfluous square braces. They just make the query harder to write and to read.

You can use ROW_NUMBER() to return one row for ProductID, like this:
JOIN (SELECT *,
ROW_NUMBER() OVER (PARTITION BY ProductID ORDER BY PhotoID) rn
FROM [ProductPhotos]) [ProductPhotos]
ON [ProductPhotos].[ProductID]=[Product].[ID] AND [ProductPhotos].rn = 1
Instead of this:
JOIN [ProductPhotos] ON [ProductPhotos].[ProductID]=[Product].[ID]

you can use sub query in join with distinct instead of joining table directly.
you can create alias and use that column as distinct in select statement, but it will create performance issues when having loads of data inside.
if you have 3 different photos for same product Id (like 2). you can use sub-query with top 1 order by PK desc to get latest picture.

Related

SQL how to count the number of relations between two tables and include zeroes?

I have a table of orders, and a table of products contained in these orders. (The products-table has order_id, a foreign key referring to orders.id).
I would like to query the number of products contained in each order. However, I also want orders to be contained in the results if they do not contain any products at all.
This means that a simple
SELECT *, COUNT(*) n_products FROM `orders` INNER JOIN `products` on `products.order_id` = `orders.id` GROUP_BY `order_id`
does not work, since orders without any products disappear.
Using a LEFT OUTER JOIN instead would add rows without product-information, but the distinction between an order with 1 product and an order with 0 products is lost.
What am I missing here?
You need a left join here, and you should be counting some column from the products table:
SELECT
o.*,
COUNT(p.order_id) AS n_products
FROM orders o
LEFT JOIN products p
ON p.order_id = o.id
GROUP BY
o.id;
Note that I assume that Postgres would allow grouping by orders.id and then selecting all columns from that table. If not, then you would only be able to select o.id in addition to the count.

Deleting data from one table if the reference doesn't exist in two other tables

I managed to import too much data into one of my database tables. I want to delete most of this data, but I need to ensure that the reference doesn't exist in either of two other tables before I delete it.
I figured this query would be the solution. It give me the right result on a test database, but in the production environment it returns no hits.
select product
from products
where 1=1
and product not in (select product from location)
and product not in (select product from lines)
You are getting no results/hits it means that you table location and/or lines having the null values in the product column. in clause failed if column having null value.
try below query just added the null condition on the top of your shared query.
select product from products
where 1=1
and product not in ( select product from location where product is not null)
and product not in ( select product from lines where product is not null)
Use EXISTS instead of IN which is more efficient
DELETE FROM products WHERE
NOT EXISTS
(
SELECT
1
FROM [Location]
WHERE Product = Products.Product
) AND
NOT EXISTS
(
SELECT
1
FROM lines
WHERE Product = Products.Product
)
Try this..
DELETE FROM Products where not exists
(select 1 from Location
join lines on lines.Product = Location.Product
and Location.Product = Products.Product
);
It's difficult to tell from your post why the query would return results in the test database but not production other than there is different data or different structures. You might try including the DDL for the participating tables in your post so that we know what the table structures are. For example, is the "product" column a PK or a text name?
One thing that does jump out is that your query will probably perform poorly. Try something like this instead: (Assuming the "product" column is a PK in Products and FK in the other tables.)
Select product
From Products As p
Left Outer Join Location As l
On p.product = l.product
And l.product is null
Left Outer Join Lines as li
On p.product = li.product
And li.product is null;
This simple set based approach may help ...
DELETE p
FROM products p
LEFT JOIN location lo ON p.product = lo.product
LEFT JOIN lines li ON p.product = li.product
WHERE lo.product IS NULL AND li.product IS NULL

Get row from one table, plus COUNT from a related table

I'm trying to build an SQL query where I grab one table's information (WHERE shops.shop_domain = X) along with the COUNT of the customers table WHERE customers.product_id = 4242451.
The shops table DOES NOT have product.id in it, but the customers table DOES HAVE the shop_domain in it, hence my attempt to do some sort of join.
I essentially want to return the following:
shops.id
shops.name
shops.shop_domain
COUNT OF CUSTOMERS WHERE customers.product_id = '4242451'
Here is my not so lovely attempt at the query.
I think I have the idea right (maybe...) but I can't wrap my head around building this query.
SELECT shops.id, shops.name, shops.shop_domain, COUNT(customers.customer_id)
FROM shops
LEFT JOIN customers ON shops.shop_domain = customers.shop_domain
WHERE shops.shop_domain = 'myshop.com' AND
customers.product_id = '4242451'
GROUP BY shops.shop_id
Relevant database schemas:
shops:
id, name, shop_domain
customers:
id, name, product_id, shop_domain
You are close. The condition on customers needs to go in the ON clause, because this is a LEFT JOIN and customers is the second table:
SELECT s.id, s.name, s.shop_domain, COUNT(c.customer_id)
FROM shops s LEFT JOIN
customers c
ON s.shop_domain = c.shop_domain AND c.product_id = '4242451'
WHERE s.shop_domain = 'myshop.com'
GROUP BY s.id, s.name, s.shop_domain;
I am also inclined to include all three columns in the GROUP BY, although Postgres (and ANSI/ISO standards) are happy with just id if it is declared as the primary key in the table.
A correlated subquery should be substantially cheaper (and simpler) for the purpose:
SELECT id, name, shop_domain
, (SELECT count(*)
FROM customers
WHERE shop_domain = s.shop_domain
AND product_id = 4242451) AS special_count
FROM shops s
WHERE shop_domain = 'myshop.com';
This way you only need to aggregate in the subquery, and need not worry about undesired effects on the outer query.
Assuming product_id is a numeric data type, so I use a numeric literal (4242451) instead of a string literal '4242451' - which might cause problems otherwise.

SQL Beginner: Getting items from 2 tables (+grouping+ordering)

I have an e-commerce website (using VirtueMart) and I sell products that consist child products. When a product is a parent, it doesn't have ParentID, while it's children refer to it. I know, not the best logic but I didn't create it.
My SQL is very basic and I believe I ask for something quite easy to achieve
Select products that have children.
Sort results by prices (ASC/DSC).
SELECT * FROM Products INNER JOIN Prices ON Products.ProductID = Prices.ProductID ORDER BY Products.Price [ASC/DSC]
Explanation:
SELECT - Select (Get/Retrieve)
* - ALL
FROM Products - Get them from a DB Table named "Products".
INNER JOIN Prices - Selects all rows from both tables as long as there is a match between the columns in both tables. Rather, JOIN DB Table "Products" with DB Table "Prices".
ON - Like WHERE, this defines which rows will be checked for matches.
Products.ProductID = Prices.ProductID - Your match criteria. Get the rows where "ProductID" exists in both DB Tables "Products" and "Prices".
ORDER BY Products.Price [ASC/DSC] - Sorting. Use ASC for Ascending, DSC for Descending.
This table design is subpar for a number of reasons. First, it appears that the value 0 is being used to indicate lack of a parent (as there's no 0 ID for products). Typically this will be a NULL value instead.
If it were a NULL value, the SQL statement to get everything without a parent would be as simple as this:
SELECT * FROM Products WHERE ParentID IS NULL
However, we can't do that. If we make the assumption that 0 = no parent, we can do this:
SELECT * FROM Products WHERE ParentID = 0
However, that's a dangerous assumption to make. Thus, the correct way to do this (given your schema above), would be to compare the two tables and ensure that the parentID exists as a ProductID:
SELECT a.*
FROM Products AS a
WHERE EXISTS (SELECT * FROM Products AS b WHERE a.ID = b.ParentID)
Next, to get the pricing, we have to join those two tables together on a common ID. As the Prices table seems to reference a ProductID, we can use that like so:
SELECT p.ProductID, p.ProductName, pr.Price
FROM Products AS p INNER JOIN Prices AS pr ON p.ProductID = pr.ProductID
WHERE EXISTS (SELECT * FROM Products AS b WHERE p.ID = b.ParentID)
ORDER BY pr.Price
That might be sufficient per the data you've shown, but usually that type of table structure indicates that it's possible to have more than one price associated with a product (we're unable to tell whether this is true based on the quick snapshot).
That should get you close... if you need something more, we'll need more detail.
use the below script if you are using ssms.
SELECT pd.ProductId,ProductName,Price
FROM product pd
LEFT JOIN price pr ON pd.ProductId=pr.ProductID
WHERE EXISTS (SELECT 1 FROM product pd1 WHERE pd.productID=pd1.ParentID)
ORDER BY pr.Price ASC
Note :neither of your parent product have price in price table. If you want the sum of price of their child product use the below script.
SELECT pd.ProductId,pd.ProductName,SUM(ISNULL(pr.Price,0)) SUM_ChildPrice
FROM product pd
LEFT JOIN product pd1 ON pd.productID=pd1.ParentID
LEFT JOIN price pr ON pd1.ProductId=pr.ProductID
GROUP BY pd.ProductId,pd.ProductName
ORDER BY pr.Price ASC
You will have to use self-join:
For example:
SELECT * FROM products parent
JOIN products children ON parent.id = children.parent_id
JOIN prices ON prices.product_id = children.id
ORDER BY prices.price
Because we are using JOIN it will filter out all entries that don't have any children.
I haven't tested it, I hope it would work.

Distinct on multi-columns in sql

I have this query in sql
select cartlines.id,cartlines.pageId,cartlines.quantity,cartlines.price
from orders
INNER JOIN
cartlines on(cartlines.orderId=orders.id)where userId=5
I want to get rows distinct by pageid ,so in the end I will not have rows with same pageid more then once(duplicate)
any Ideas
Thanks
Baaroz
Going by what you're expecting in the output and your comment that says "...if there rows in output that contain same pageid only one will be shown...," it sounds like you're trying to get the top record for each page ID. This can be achieved with ROW_NUMBER() and PARTITION BY:
SELECT *
FROM (
SELECT
ROW_NUMBER() OVER(PARTITION BY c.pageId ORDER BY c.pageID) rowNumber,
c.id,
c.pageId,
c.quantity,
c.price
FROM orders o
INNER JOIN cartlines c ON c.orderId = o.id
WHERE userId = 5
) a
WHERE a.rowNumber = 1
You can also use ROW_NUMBER() OVER(PARTITION BY ... along with TOP 1 WITH TIES, but it runs a little slower (despite being WAY cleaner):
SELECT TOP 1 WITH TIES c.id, c.pageId, c.quantity, c.price
FROM orders o
INNER JOIN cartlines c ON c.orderId = o.id
WHERE userId = 5
ORDER BY ROW_NUMBER() OVER(PARTITION BY c.pageId ORDER BY c.pageID)
If you wish to remove rows with all columns duplicated this is solved by simply adding a distinct in your query.
select distinct cartlines.id,cartlines.pageId,cartlines.quantity,cartlines.price
from orders
INNER JOIN
cartlines on(cartlines.orderId=orders.id)where userId=5
If however, this makes no difference, it means the other columns have different values, so the combinations of column values creates distinct (unique) rows.
As Michael Berkowski stated in comments:
DISTINCT - does operate over all columns in the SELECT list, so we
need to understand your special case better.
In the case that simply adding distinct does not cover you, you need to also remove the columns that are different from row to row, or use aggregate functions to get aggregate values per cartlines.
Example - total quantity per distinct pageId:
select distinct cartlines.id,cartlines.pageId, sum(cartlines.quantity)
from orders
INNER JOIN
cartlines on(cartlines.orderId=orders.id)where userId=5
If this is still not what you wish, you need to give us data and specify better what it is you want.