T-Sql find duplicate row values - sql-server-2005

I want to write a stored procedure.
In that stored procedure, I want to find duplicate row values from a table, and calculate sum operation on these rows to the same table.
Let's say, I have a CustomerSales table;
ID SalesRepresentative Customer Quantity
1 Michael CustA 55
2 Michael CustA 10
and I need to turn table to...
ID SalesRepresentative Customer Quantity
1 Michael CustA 65
2 Michael CustA 0
When I find SalesRepresentative and Customer duplicates at the same time, I want to sum all Quantity values of these rows and assign to the first row of a table, and others will be '0'.
Could you help me.

To aggregate duplicates into one row:
SELECT min(ID) AS ID, SalesRepresentative, Customer
,sum(Quantity) AS Quantity
FROM CustomerSales
GROUP BY SalesRepresentative, Customer
ORDER BY min(ID)
Or, if you actually want those extra rows with 0 as Quantity in the result:
SELECT ID, SalesRepresentative, Customer
,CASE
WHEN (count(*) OVER (PARTITION BY SalesRepresentative,Customer)) = 1
THEN Quantity
WHEN (row_number() OVER (PARTITION BY SalesRepresentative,Customer
ORDER BY ID)) = 1
THEN sum(Quantity) OVER (PARTITION BY SalesRepresentative,Customer)
ELSE 0
END AS Quantity
FROM CustomerSales
ORDER BY ID
This makes heavy use of window functions.
Alternative version without window functions:
SELECT min(ID) AS ID, SalesRepresentative, Customer, sum(Quantity) AS Quantity
FROM CustomerSales
GROUP BY SalesRepresentative, Customer
UNION ALL
SELECT ID, SalesRepresentative, Customer, 0 AS Quantity
FROM CustomerSales c
GROUP BY SalesRepresentative, Customer
LEFT JOIN (
SELECT min(ID) AS ID
FROM CustomerSales
GROUP BY SalesRepresentative, Customer
) x ON (x.ID = c.ID)
WHERE x.ID IS NULL
ORDER BY ID

Related

Get the date for each duplicated row in SQL Server

I've made a query to get how many products are sold more than one time and it worked.
Now I want to show the transaction date for each of these duplicated sales, but when I insert the date on the select it brings me a lot less rows: something is going wrong. The query without the date returns 9855 rows and with the date just 36 rows.
Here is the query I'm doing:
SELECT TransactionDate,
ProductName,
QtyOfSales = COUNT(*)
FROM product_sales
WHERE ProductID = 1 -- Product Sold ID
AND ProductName IS NOT NULL
GROUP BY ProductName,
TransactionDate
HAVING COUNT(*) > 1
Perhaps a subquery? Can you help in that regard?
You can use the corresponding COUNT window function, that will find the amount of transactions by partitioning on the "ProductName" as required:
WITH cte AS(
SELECT TransactionDate,
ProductName,
COUNT(*) OVER(PARTITION BY ProductName) AS QtyOfSales
FROM product_sales
WHERE ProductID = 1 -- Product Sold ID
AND ProductName IS NOT NULL
)
SELECT DISTINCT TransactionDate,
ProductName
FROM cte
WHERE QtyOfSales > 1

Get number of customers having 1,2 and more than 3 products

Trying to figure out a query which shows the number of customer having 1,2 and more than 3 products. Here are the table name and fields:
Product(prod_no, prod_cust_id)
Customer(cust_id)
Product
prod_no
prod_cust_id
Cheetos1
WR123
Cheetos2
WR123
Lay1
WP232
Prings
WP678
Customer
cust_id
WN999
WR123
WP232
WP678
Example of correct query I want to get is:
1 Product - 100 customer
2 Product - 52 customer
3 Products and above - 10 customer
Product
Customers
1
100
2
52
>=3
10
I tried with the following query
SELECT COUNT (DISTINCT PROD_NO)"Product", CUST_ID"Customers"
FROM PRODUCT, CUSTOMER
WHERE PROD_CUST_ID = CUST_ID
HAVING COUNT(PROD_NO) >= 3 --for 3 products and above
GROUP BY CUST_ID
But the result is not what I wanted, so close yet so far. I tried only for 3 products and above, but how to add together with 1 product and 2 products.
Please help me out thanks
One option would be starting with distinctly counting by each column ( prod_no,prod_cust_id ), and evaluating the three or more products as an individual case within the conditional such as
WITH prod_cust AS
(
SELECT COUNT(DISTINCT prod_no) AS prod_no,
DECODE( SIGN(COUNT(DISTINCT prod_cust_id)-2),1,'>=3',
COUNT(DISTINCT prod_cust_id) ) AS prod_cust_id
FROM product
GROUP BY prod_no
)
SELECT prod_cust_id AS "Product", SUM(prod_no) AS "Customers"
FROM prod_cust
GROUP BY prod_cust_id
ORDER BY 1
Demo
You can first count the no of customers in the product table and then can count them separately. You can try the below query -
WITH DATA AS (SELECT P.*, COUNT(*) OVER(PARTITION BY prod_cust_id) CNT
FROM Product P)
SELECT '1' Product, COUNT(CASE WHEN CNT = 1 THEN CNT ELSE NULL END) Customers
FROM DATA
UNION ALL
SELECT '2', COUNT(CASE WHEN CNT = 2 THEN CNT ELSE NULL END)
FROM DATA
UNION ALL
SELECT '>=3', COUNT(CASE WHEN CNT >= 3 THEN CNT ELSE NULL END)
FROM DATA;
Demo.

Getting the lastest entry grouped by ID

I have a table with stock for products. The problem is that every time there is a stock change, the new value is stored, together with the new Quantity. Example:
ProductID | Quantity | LastUpdate
1 123 2019.01.01
2 234 2019.01.01
1 444 2019.01.02
2 222 2019.01.02
I therefore need to get the latest stock update for every Product and return this:
ProductID | Quantity
1 444
2 222
The following SQL works, but is slow.
SELECT ProductID, Quantity
FROM (
SELECT ProductID, Quantity
FROM Stock
WHERE LastUpdate
IN (SELECT MAX(LastUpdate) FROM Stock GROUP BY ProductID)
)
Since the query is slow and supposed to be left joined into another query, I really would like some input on how to do this better.
Is there another way?
Use analytic functions. row_number can be used in this case.
SELECT ProductID, Quantity
FROM (SELECT ProductID, Quantity, row_number() over(partition by ProductID order by LstUpdte desc) as rnum
FROM Stock
) s
WHERE RNUM = 1
Or with first_value.
SELECT DISTINCT ProductID, FIRST_VALUE(Quantity) OVER(partition by ProductID order by LstUpdte desc) as quantuity
FROM Stock
Just another option is using WITH TIES in concert with Row_Number()
Full Disclosure: Vamsi's answer will be a nudge more performant.
Example
Select Top 1 with ties *
From YourTable
Order by Row_Number() over (Partition By ProductID Order by LastUpdate Desc)
Returns
ProductID Quantity LastUpdate
1 444 2019-01-02
2 222 2019-01-02
So you Could use a CTE(Common Table Expression)
Base Data:
SELECT 1 AS ProductID
,123 AS Quantity
,'2019-01-01' as LastUpdate
INTO #table
UNION
SELECT 2 AS ProductID
,234 AS Quantity
,'2019-01-01' as LastUpdate
UNION
SELECT 1 AS ProductID
,444 AS Quantity
,'2019-01-02' as LastUpdate
UNION
SELECT 2 AS ProductID
,222 AS Quantity
,'2019-01-02' as LastUpdate
Here is the code using a Common Table Expression.
WITH CTE (ProductID, Quantity, LastUpdate, Rnk)
AS
(
SELECT ProductID
,Quantity
,LastUpdate
,ROW_NUMBER() OVER(PARTITION BY ProductID ORDER BY LastUpdate DESC) AS Rnk
FROM #table
)
SELECT ProductID, Quantity, LastUpdate
FROM CTE
WHERE rnk = 1
Returns
You could then Join the CTE to whatever table you need.
row_number() function might be the most efficient, but the big slow down in your query is the use of the IN statement when used on a subquery, it's a little bit of a tricky one but a join is faster. This query should get what you want and be much faster.
SELECT
a.ProductID
,a.Quantity
FROM stock as a
INNER JOIN (
SELECT
ProductID
,MAX(LastUpdate) as LastUpdate
FROM stock
GROUP BY ProductID
) b
ON a.ProductID = b.ProductId AND
a.LastUpdate = b.LastUpdate

SQL Select Group By Min() - but select other

I want to select the ID of the Table Products with the lowest Price Grouped By Product.
ID Product Price
1 123 10
2 123 11
3 234 20
4 234 21
Which by logic would look like this:
SELECT
ID,
Min(Price)
FROM
Products
GROUP BY
Product
But I don't want to select the Price itself, just the ID.
Resulting in
1
3
EDIT: The DBMSes used are Firebird and Filemaker
You didn't specify your DBMS, so this is ANSI standard SQL:
select id
from (
select id,
row_number() over (partition by product order by price) as rn
from orders
) t
where rn = 1
order by id;
If your DBMS doesn't support window functions, you can do that with joining against a derived table:
select o.id
from orders o
join (
select product,
min(price) as min_price
from orders
group by product
) t on t.product = o.product and t.min_price = o.price;
Note that this will return a slightly different result then the first solution: if the minimum price for a product occurs more then once, all those IDs will be returned. The first solution will only return one of them. If you don't want that, you need to group again in the outer query:
select min(o.id)
from orders o
join (
select product,
min(price) as min_price
from orders
group by product
) t on t.product = o.product and t.min_price = o.price
group by o.product;
SELECT ID
FROM Products as A
where price = ( select Min(Price)
from Products as B
where B.Product = A.Product )
GROUP BY id
This will show the ID, which in this case is 3.

Finding customers that only bought items no one else bought

Below is a list of orders, is there a way to find the person_id of the customers, that has only bought products no one else has bought?
CREATE TABLE orders
AS
SELECT product_id, person_id
FROM ( VALUES
( 1 , 1 ),
( 2 , 1 ),
( 2 , 2 ),
( 3 , 3 ),
( 12, 6 ),
( 10, 3 )
) AS t(product_id, person_id);
The result would be the following table:
| person_id |
|-----------|
| 3 |
| 6 |
Do i have to find all the people who did buy items no one else bought and create a table that doesn't include those people?
You want all the products purchased by the person to be unique.
select person_id
from (select t.*,
min(person_id) over (partition by product_id) as minp,
max(person_id) over (partition by product_id) as maxp
from t
) t
group by person_id
having sum(case when minp <> maxp then 1 else 0 end) = 0;
You are probably thinking "Huh? What does this do?".
The subquery calculates the minimum person and maximum person on each product. If these are the same, than that one person is the only purchaser.
The having then checks that there are no non-single-purchaser products for a given person.
Perhaps a more intuitive phrasing of the logic would be:
select person_id
from (select t.*,
count(distinct person_id) over (partition by product_id) as numpersons
from t
) t
group by person_id
having max(numperson) = 1;
Alas, Postgres doesn't support COUNT(DISTINCT) as a window function.
The traditional self join with boolean aggregation
select o0.person_id
from
orders o0
left join
orders o1 on o0.product_id = o1.product_id and o0.person_id <> o1.person_id
group by o0.person_id
having bool_and(o1.product_id is null)
;
person_id
-----------
3
6
The inline view which is being joined gets all the product_ids which have only one person_id. Once all product_ids are found they will be joined to the original customers table to get the person_ids. This should solve your problem!!
SELECT person_id
FROM customers c1
INNER JOIN
(
SELECT product_id
FROM customers
GROUP BY product_id
HAVING COUNT(person_id ) = 1
) c2
ON c1.product_id = c2.product_id;
This is Gordon's logic using aggregates only:
SELECT person_id
FROM
(
SELECT product_id,
-- if count = 1 it's the only customer who bought this product
min(person_id) as person_id,
-- if the combination(person_id,product_id) is unique DISTINCT can be removed
count(distinct person_id) as cnt
FROM customers
GROUP BY product_id
) AS dt
GROUP BY person_id
HAVING max(cnt) = 1 -- only unique products
Here is another solution:
with unique_products as
(select product_id
from orders
group by product_id
having count(*) = 1)
select person_id
from orders
except
select person_id
from orders
where not exists
(select * from unique_products where unique_products.product_id = orders.product_id)
First all the identifier of products that appear in a single order are found. Then we subtract from all the persons (in the orders) those which do not have a order with a single product (i.e. all the persons that have at least ordered a product ordered by somebody else).