Delete duplicates by keeping the cheapest price only - sql

I am working on an product catalog (ecommerce), stored in a PostgreSQL database. I currently have duplicates. I would like to remove those duplicated products by keeping the cheapest one only.
The fields in database that are important :
ID [PK] SKU EAN Price ....
1 SKU1 123 45.0 ....
2 SKU2 456 36.0 ....
3 SKU3 123 40.0 ....
4 SKU4 789 58.0 ....
5 SKU5 123 38.0 ....
...
I have a SERIAL PRIMARY KEY on the field ID.
I have a NOT NULL SKU, a NOT NULL EAN-13 code and a NOT NULL price for each product.
We can see that the EAN "123" is duplicated several times. I would like to find a SQL request that deletes all duplicates (all the line), by keeping only ONE, which would have the lowest price.
We would have :
ID [PK] SKU EAN Price ....
2 SKU2 456 36.0 ....
4 SKU4 789 58.0 ....
5 SKU5 123 38.0 ....
...
To know : the number of duplicates can be variable. Here is an example with 3 products with the same EAN, but we could have 2, 4, 8 or 587...
So far I've been able to delete the duplicate with the lowest or greatest ID in the case of 2 duplicates only, but it's not what I am trying to find...
FROM
(SELECT Price,
MIN(Price) OVER( PARTITION BY ean ORDER BY Price DESC ) AS row_num FROM TABLE ) t
WHERE t.row_num > 1 );

I would do this using a correlated subquery:
delete from mytable t
where t.price > (select min(t2.price) from mytable t2 where t2.sku = t.sku);

Here is one solution that uses Postgres DELETE ... USING syntax:
DELETE
FROM mytable t1
USING mytable t2
WHERE t1.sku = t2.sku AND t1.price > t2.price
This will remove records with duplicate skus while retaining the one with the smallest price.

Related

How to add a query to a table in SQL?

I have 3 tables.
For simplicity I changed them to these sample tables.
table1: CorporateActionSmmary
RATE Quantity ProductID
--------------------------
56 0 1487
30 0 1871
40 0 8750
table2# ProductMaster
RATEGROSS ISIN ProductID
--------------------------
60 JP0001 1487
33 JP0002 1871
45 JP0003 8750
table3# OpenPosition
Quantity ProductID
-------------------
5 1487
1 1487
5 1487
3 1871
2 1871
4 8750
2 8750
7 8750
3 8750
First I need to add ISIN from table2 to table1
table1: CorporateActionSmmary
RATE Quantity ProductID ISIN
-------------------------------------
56 0 1487 JP0001
30 0 1871 JP0002
40 0 8750 JP0003
So, I used this code
SELECT [dbo].[CorporateActionSummary].*, [dbo].[ProductMaster].[ISIN]
FROM [dbo].[CorporateActionSummary] JOIN [dbo].[ProductMaster] ON CorporateActionSummary.ProductID = ProductMaster.ProductID
Now as you can see the Quantity is missing in Table1 so I have to add-up all the quantities in Table3 for each product ID and add to Table1(as a new column or over-write the Quntity column)
I think I can get the sum of each ProductID's Quantity by the following code, But how can I add it to Table1 that already has ISIN column
SELECT SUM(Qantity),ProductID
FROM [dbo].[OpenPositions]
I am super new to SQL, please explain in detail if it is possible, thank you
I am using Microsoft SQL Server Management Studio
you can sum the quantities and then join with your query like so:
SELECT CA.*, PM.[ISIN],CA.Quantity
FROM [dbo].[CorporateActionSummary] CA
JOIN [dbo].[ProductMaster] PM
ON CA.ProductID = PM.ProductID
JOIN (
SELECT ProductID, SUM(Qantity) Quantity
FROM [dbo].[OpenPositions]
GROUP BY ProductID
) OO
on OO.ProductID = CA.ProductID
you are almost there.. you just need to use the same logic to join to the product master table. However, since you need the total of quantity, you need to group by the other columns you select (but not aggregate).
The query will be something like this :
SELECT
[dbo].[CorporateActionSummary].ProductID
, [dbo].[ProductMaster].[ISIN]
,sum([OpenPosition].Quantity) as quantity
FROM [dbo].[CorporateActionSummary]
JOIN [dbo].[ProductMaster]
ON CorporateActionSummary.ProductID = ProductMaster.ProductID
JOIN [dbo].[OpenPosition]
ON CorporateActionSummary.ProductID = OpenPosition.ProductID
group by
[dbo].[CorporateActionSummary].ProductID
, [dbo].[ProductMaster].[ISIN]
if you want to add more columns to your select, then you need to group by those colums as well

Joining tables and aggregation/sub queries

I have 2 tables as show below.
Table1
Order ID
Item_code
Sales_Price
Qty_ordered
Total
Qty shipped
1000
111
10
5
$50
1
1000
222
20
10
$200
2
1000
333
30
15
$450
0
I have another table that stores only the details of how much was invoiced (i.e. how much we shipped)
Table2 (because we shipped only 10x1 and 20x2 = $50)
Order ID
Invoice_total
1000
$50
I wrote the following query,
select T1.Order_ID,
sum(T1.Qty_Ordered) as Qty_Ordered,
sum(T1.Total) as Total_Amt_ordered,
sum(T1.Qty_shipped) as Qty_Shipped,
sum(T2.Invoice_total)
from T1 join
T2 on T1.Order_ID = T2.Order_ID
This query gives me the following output, (It is adding $50 to all the rows of T1 Orders).
Order ID
Qty_ordered
Total
Qty shipped
Invoice_total
1000
30
$700
3
$150
Whereas now, I want my output to be as:
Order ID
Qty_ordered
Total
Qty shipped
Invoice_total
1000
30
$700
3
$50
(because we shipped only $50)
What changes should I make to my query?
I know I can just hard code it but my database has 1000's of orders and 1000's of Half shipped Orders. I want to keep track of Shipped $ (Invoiced $) for all the orders.
If I understand correctly, you want:
select T2.Order_ID, T2.Invoice_total,
sum(T1.Qty_Ordered) as Qty_Ordered,
sum(T1.Total) as Total_Amt_ordered,
sum(T1.Qty_shipped) as Qty_Shipped,
from T2 join
T1
on T1.Order_ID = T2.Order_ID
group by T2.Order_ID, T2.Invoice_total;
That is, you don't want to aggregate Invoice_total. You just want it to be a group by key.

Trying to group quantities based off an ID

We have two columns one with ID and another with QTY. And the layout goes along the lines of:
ID QTY
-------------
123 456
123 634
123 4235
234 67
234 735
234 666
What I am trying to do is add up all the numbers based off the ID so it would look like:
ID QTY
-------------
123 5325
234 1468
I currently have the following SQL query:
SELECT CLIENT_ID, ID, QTY_ON_HAND,
SUM(QTY_ON_HAND)
FROM
(select CLIENT_ID, ID, QTY_ON_HAND
FROM INVENTORY
WHERE CLIENT_ID = '(CLIENT ID HERE)')
GROUP BY QTY_ON_HAND
It would be appreciated if anyone can tell me simple way on how to do this.
I do not have a test DB at hand, but it should be this:
select
ID,
sum(QTY) as TOTAL
from
YourTableName
group by
ID;
YourTableName ... name of data table with two columns ID, QTY. Be aware of whole table name, it can be also something like dbo.yourtablename, etc.

Select all rows based on alternative publisher

I want list all the rows by alternative publisher with price ascending, see the example table below.
id publisher price
1 ABC 100.00
2 ABC 150.00
3 ABC 105.00
4 XYZ 135.00
5 XYZ 110.00
6 PQR 105.00
7 PQR 125.00
The expected result would be:
id publisher price
1 ABC 100.00
6 PQR 105.00
5 XYZ 110.00
3 ABC 105.00
7 PQR 125.00
4 XYZ 135.00
2 ABC 150.00
What would be the required SQL?
This should do it:
select id, publisher, price
from (
select id, publisher, price,
row_number() over (partition by publisher order by price) as rn
from publisher
) t
order by rn, publisher, price
The window functions assigns unique numbers for each publisher price. Based on that the outer order by will then first display all rows with rn = 1 which are the rows for each publisher with the lowest price. The second row for each publisher has the second lowest price and so on.
SQLFiddle example: http://sqlfiddle.com/#!4/06ece/2
SELECT id, publisher, price
FROM tbl
ORDER BY row_number() OVER (PARTITION BY publisher ORDER BY price), publisher;
One cannot use the output of window functions in the WHERE or HAVING BY clauses because window functions are applied after those. But one can use window functions in the ORDER BY clause.
SQL Fiddle.
Not sure what your table name is - I have called it publishertable. But the following will order the result by price in ascending order - which is the result you are looking for:
select id, publisher, price from publishertable order by price asc
if I've got it right. You should use ROW_NUMBER() function to range prices inside of each publisher and then order by this range and publisher.
SELECT ID,
Publisher,
Price,
Row_number() OVER (PARTITION BY Publisher ORDER BY Price) as rn
FROM T
ORDER BY RN,Publisher
SQLFiddle demo

How to Retrieve Maximum Value of Each Group? - SQL

There is a table tbl_products that contains data as shown below:
Id Name
----------
1 P1
2 P2
3 P3
4 P4
5 P5
6 P6
And another table tbl_inputs that contains data as shown below:
Id Product_Id Price Register_Date
----------------------------------------
1 1 10 2010-01-01
2 1 20 2010-10-11
3 1 30 2011-01-01
4 2 100 2010-01-01
5 2 200 2009-01-01
6 3 500 2011-01-01
7 3 270 2010-10-15
8 4 80 2010-01-01
9 4 50 2010-02-02
10 4 92 2011-01-01
I want to select all products(id, name, price, register_date) with maximum date in each group.
For Example:
Id Name Price Register_Date
----------------------------------------
3 P1 30 2011-01-01
4 P2 100 2010-01-01
6 P3 500 2011-01-01
10 P4 92 2011-01-01
select
id
,name
,code
,price
from tbl_products tp
cross apply (
select top 1 price
from tbl_inputs ti
where ti.product_id = tp.id
order by register_date desc
) tii
Although is not the optimum way you can do it like:
;with gb as (
select
distinct
product_id
,max(register_date) As max_register_date
from tbl_inputs
group by product_id
)
select
id
,product_id
,price
,register_date
from tbl_inputs ti
join gb
on ti.product_id=gb.product_id
and ti.register_date = gb.max_register_date
But as I said earlier .. this is not the way to go in this case.
;with cte as
(
select t1.id, t1.name, t1.code, t2.price, t2.register_date,
row_number() over (partition by product_id order by register_date desc) rn
from tbl_products t1
join tbl_inputs t2
on t1.id = t2.product_id
)
select id, name, code, price, register_date
from cte
where rn = 1
Something like this..
select id, product_id, price, max(register_date)
from tbl_inputs
group by id, product_id, price
you can use the max function and the group by clause. if you only need results from the table tbl_inputs you even don't need a join
select product_id, max(register_date), price
from tbl_inputs
group by product_id, price
if you need field from the tbl_prducts you have to use a join.
select p.name, p. code, i.id, i.price, max(i.register_date)
from tbl_products p join tbl_inputs i on p.id=i.product_id
grooup by p.name, p. code, i.id, i.price
Try this:
SELECT id, product_id, price, register_date
FROM tbl_inputs T1 INNER JOIN
(
SELECT product_id, MAX(register_date) As Max_register_date
FROM tbl_inputs
GROUP BY product_id
) T2 ON(T1.product_id= T2.product_id AND T1.register_date= T2.Max_register_date)
This is, of course, assuming your dates are unique. if they are not, you need to add the DISTINCT Keyword to the outer SELECT statement.
edit
Sorry, I didn't explain it very well. Your dates can be duplicated, it's not a problem as long as they are unique per product id. if you can have duplicated dates per product id, then you will have more then one row per product in the outcome of the select statement I suggested, and you will have to find a way to reduce it to one row per product.
i.e:
If you have records like that (when the last date for a product appears more then once in your table with different prices)
id | product_Id | price | register_date
--------------------------------------------
1 | 1 | 10.00 | 01/01/2000
2 | 1 | 20.00 | 01/01/2000
it will result in having both of these records as outcome.
However, if the register_date is unique per product id, then you will get only one result for each product id.