Delete where one column contains duplicates - sql

consider the below:
ProductID Supplier
--------- --------
111 Microsoft
112 Microsoft
222 Apple Mac
222 Apple
223 Apple
In this example product 222 is repeated because the supplier is known as two names in the data supplied.
I have data like this for thousands of products. How can I delete the duplicate products or select individual results - something like a self join with SELECT TOP 1 or something like that?
Thanks!

I think you want to do the following:
select t.*
from (select t.*,
row_number() over (partition by product_id order by (select NULL)) as seqnum
from t
) t
where seqnum = 1
This selects an arbitrary row for each product.
To delete all rows but one, you can use the same idea:
with todelete (
(select t.*,
row_number() over (partition by product_id order by (select NULL)) as seqnum
from t
)
delete from to_delete where seqnum > 1

DELETE a
FROM tableName a
LEFT JOIN
(
SELECT Supplier, MIN(ProductID) min_ID
FROM tableName
GROUP BY Supplier
) b ON a.supplier = b.supplier AND
a.ProductID = b.min_ID
WHERE b.Supplier IS NULL
SQLFiddle Demo
or if you want to delete productID which has more than onbe product
WITH cte
AS
(
SELECT ProductID, Supplier,
ROW_NUMBER() OVER (PARTITION BY ProductID ORDER BY Supplier) rn
FROM tableName
)
DELETE FROM cte WHERE rn > 1
SQLFiddle Demo

;WITH Products_CTE AS
(
SELECT ProductID, Supplier,
ROW_NUMBER() OVER (PARTITION BY ProductID ORDER BY <some value>) as rn
FROM PRODUCTS
)
SELECT *
FROM Products_CTE
WHERE rn = 1
The some value is going to be the key that determines which version of Supplier you keep. If you want the first instance of the supplier, you could use the DateAdded column, if it exists.

Related

Selecting the latest order

I need to select the data of all my customers with the records displayed in the image. But I need to get the most recent record only, for example I need to get the order # E987 for John and E888 for Adam. As you can see from the example, when I do the select statement, I get all the order records.
You don't mention the specific database, so I'll answer with a generic solution.
You can do:
select *
from (
select t.*,
row_number() over(partition by name order by order_date desc) as rn
from t
) x
where rn = 1
You can use analytical function row_number.
Select * from
(Select t.*,
Row_number() over (partition by customer_id order by order_date desc) as rn
From your_table t) t
Where rn = 1
Or you can use not exists as follows:
Select *
From yoir_table t
Where not exists
(Select 1 from your_table tt
Where t.customer_id = tt.custome_id
And tt.order_date > t.order_date)
You can do it with a subquery that finds the last order date.
SELECT t.*
FROM yoir_table t
JOIN (SELECT tt.custome_id,
MAX(tt.order_date) MaxOrderDate
FROM yoir_table tt
GROUP BY tt.custome_id) AS tt
ON t.custome_id = tt.custome_id
AND t.order_date = tt.MaxOrderDate

Getting the lastest entry grouped by ID

I have a table with stock for products. The problem is that every time there is a stock change, the new value is stored, together with the new Quantity. Example:
ProductID | Quantity | LastUpdate
1 123 2019.01.01
2 234 2019.01.01
1 444 2019.01.02
2 222 2019.01.02
I therefore need to get the latest stock update for every Product and return this:
ProductID | Quantity
1 444
2 222
The following SQL works, but is slow.
SELECT ProductID, Quantity
FROM (
SELECT ProductID, Quantity
FROM Stock
WHERE LastUpdate
IN (SELECT MAX(LastUpdate) FROM Stock GROUP BY ProductID)
)
Since the query is slow and supposed to be left joined into another query, I really would like some input on how to do this better.
Is there another way?
Use analytic functions. row_number can be used in this case.
SELECT ProductID, Quantity
FROM (SELECT ProductID, Quantity, row_number() over(partition by ProductID order by LstUpdte desc) as rnum
FROM Stock
) s
WHERE RNUM = 1
Or with first_value.
SELECT DISTINCT ProductID, FIRST_VALUE(Quantity) OVER(partition by ProductID order by LstUpdte desc) as quantuity
FROM Stock
Just another option is using WITH TIES in concert with Row_Number()
Full Disclosure: Vamsi's answer will be a nudge more performant.
Example
Select Top 1 with ties *
From YourTable
Order by Row_Number() over (Partition By ProductID Order by LastUpdate Desc)
Returns
ProductID Quantity LastUpdate
1 444 2019-01-02
2 222 2019-01-02
So you Could use a CTE(Common Table Expression)
Base Data:
SELECT 1 AS ProductID
,123 AS Quantity
,'2019-01-01' as LastUpdate
INTO #table
UNION
SELECT 2 AS ProductID
,234 AS Quantity
,'2019-01-01' as LastUpdate
UNION
SELECT 1 AS ProductID
,444 AS Quantity
,'2019-01-02' as LastUpdate
UNION
SELECT 2 AS ProductID
,222 AS Quantity
,'2019-01-02' as LastUpdate
Here is the code using a Common Table Expression.
WITH CTE (ProductID, Quantity, LastUpdate, Rnk)
AS
(
SELECT ProductID
,Quantity
,LastUpdate
,ROW_NUMBER() OVER(PARTITION BY ProductID ORDER BY LastUpdate DESC) AS Rnk
FROM #table
)
SELECT ProductID, Quantity, LastUpdate
FROM CTE
WHERE rnk = 1
Returns
You could then Join the CTE to whatever table you need.
row_number() function might be the most efficient, but the big slow down in your query is the use of the IN statement when used on a subquery, it's a little bit of a tricky one but a join is faster. This query should get what you want and be much faster.
SELECT
a.ProductID
,a.Quantity
FROM stock as a
INNER JOIN (
SELECT
ProductID
,MAX(LastUpdate) as LastUpdate
FROM stock
GROUP BY ProductID
) b
ON a.ProductID = b.ProductId AND
a.LastUpdate = b.LastUpdate

Get most expensive and cheapest items from two tables

I'm trying to get the most expensive and cheapest items from two different tables.
The output should be one row with the values for MostExpensiveItem, MostExpensivePrice, CheapestItem, CheapestPrice
I was able to get the price of the most expensive and cheapest items in the two tables with following query:
SELECT
MAX(ExtrasPrice) as MostExpensivePrice, MIN(ExtrasPrice) as CheapestPrice
FROM
(
SELECT ExtrasPrice FROM Extras
UNION ALL
SELECT ItemPrice FROM Items
) foo
How can I add the names of the items (ItemName, ExtrasName) to my output? Again, there should only be one row as the output.
Try this:
SELECT TOP 1 FIRST_VALUE(Price) OVER (ORDER BY Price) AS MinPrice,
FIRST_VALUE(Name) OVER (ORDER BY Price) AS MinName,
LAST_VALUE(Price) OVER (ORDER BY Price DESC) AS MaxPrice,
LAST_VALUE(Name) OVER (ORDER BY Price DESC) AS MaxName
FROM (
SELECT ExtrasName AS Name, ExtrasPrice AS Price FROM Extras
UNION ALL
SELECT ItemName As Name, ItemPrice AS Price FROM Items) u
SQL Fiddle Demo
TOP 1 with order by clause should work for you. Try this
SELECT *
FROM (SELECT TOP 1 ExtrasPrice,ExtrasName
FROM Extras ORDER BY ExtrasPrice Asc),
(SELECT TOP 1 ItemPrice,ItemName
FROM Items ORDER BY ItemPrice Desc)
Note: Comma can be replaced with CROSS JOIN
You can use row_number() for this. If you are satisfied with two rows:
SELECT item, price
FROM (SELECT foo.*, row_number() over (order by price) as seqnum_asc,
row_number() over (order by price) as seqnum_desc
FROM (SELECT item, ExtrasPrice as price FROM Extras
UNION ALL
SELECT item, ItemPrice FROM Items
) foo
) t
WHERE seqnum_asc = 1 or seqnum_desc = 1;
EDIT:
If you have an index on "price" in both tables, then the cheapest method is probably:
with exp as (
(select top 1 item, ExtrasPrice as price
from Extras e
order by price desc
) union all
(select top 1 i.item, ItemPrice
from Items i
order by price desc
)
),
cheap as (
(select top 1 item, ExtrasPrice as price
from Extras e
order by price asc
) union all
(select top 1 i.item, ItemPrice
from Items i
order by price asc
)
)
select top 1 *
from exp
order by price desc
union all
select top 1 *
from cheap
order by price asc;
If you want this in one row, you can replace the final query with:
select e.*, c.*
from (select top 1 *
from exp
order by price desc
) e cross join
(select top 1 *
from cheap
order by price asc
) c

display max on one columns with multiple columns in output

How can I display maximum OrderId for a CustomerId with many columns?
I have a table with following columns:
CustomerId, OrderId, Status, OrderType, CustomerType
A customer with Same customer id could have many order ids(1,2,3..) I want to be able to display the max Order id with the rest of the customers in a sql view. how can I achieve this?
Sample Data:
CustomerId OrderId OrderType
145042 1 A
110204 1 C
145042 2 D
162438 1 B
110204 2 B
103603 1 C
115559 1 D
115559 2 A
110204 3 A
I'd use a common table expression and ROW_NUMBER:
;With Ordered as (
select *,
ROW_NUMBER() OVER (PARTITION BY CustomerID
ORDER BY OrderId desc) as rn
from [Unnamed table from the question]
)
select * from Ordered where rn = 1
select * from table_name
where orderid in
(select max(orderid) from table_name group by customerid)
One way to do this is with not exists:
select t.*
from table t
where not exists (select 1
from table t2
where t2.CustomerId = t.CustomerId and
t2.OrderId > t.OrderId
);
This is saying: "get me all rows from t where there is no higher order id for the customer."

SQL Server Group By Complex Query

In SQL Server, suppose we have a SALES_HISTORY table as below.
CustomerNo PurchaseDate ProductId
1 20120411 12
1 20120330 13
2 20120312 14
3 20120222 16
3 20120109 16
... and many records for each purchase of each customer...
How can I write the appropriate query for finding:
For each customer,
find the product he bought at MOST,
find the percentage of this product over all products he bought.
The result table must have columns like:
CustomerNo,
MostPurchasedProductId,
MostPurchasedProductPercentage
Assuming SQL Server 2005+, you can do the following:
;WITH CTE AS
(
SELECT *,
COUNT(*) OVER(PARTITION BY CustomerNo, ProductId) TotalProduct,
COUNT(*) OVER(PARTITION BY CustomerNo) Total
FROM YourTable
), CTE2 AS
(
SELECT *,
RN = ROW_NUMBER() OVER(PARTITION BY CustomerNo
ORDER BY TotalProduct DESC)
FROM CTE
)
SELECT CustomerNo,
ProductId MostPurchasedProductId,
CAST(TotalProduct AS NUMERIC(16,2))/Total*100 MostPurchasedProductPercent
FROM CTE2
WHERE RN = 1
You still need to deal when you have more than one product as the most purchased one. Here is a sqlfiddle with a demo for you to try.
Could do a lot prettier, but it works:
with cte as(
select CustomerNo, ProductId, count(1) as c
from SALES_HISTORY
group by CustomerNo, ProductId)
select CustomerNo, ProductId as MostPurchasedProductId, (t.c * 1.0)/(select sum(c) from cte t2 where t.CustomerNo = t2.CustomerNo) as MostPurchasedProductPercentage
from cte t
where c = (select max(c) from cte t2 where t.CustomerNo = t2.CustomerNo)
SQL Fiddle