Query for parents where all children have a pair/duplicate - sql

I'm looking for direction on a query to get ID's where, for each transaction, return TransDd's where all children (could say product,qty,price) have a pair / duplicate value. Example here:
TransID Product QTY Price
1 a 2 1.0
1 a 2 1.0
1 b 3 2.5
2 a 1 1.0
2 a 1 1.0
2 b 2 2.0
2 b 2 2.0
3 a 5 2.0
3 a 4 3.0
4 a 1 2.0
4 a 1 2.0
4 b 2 2.0
4 b 2 2.0
4 c 1 1.0
In this example, only transID 2 would be returned.
so far, I'm stuck along the lines of
select transid, product, qty, price
, row_number() over (partition by transid, product, qty, price order by transID desc) rk
from x
But I think I'm on the wrong track there. Appreciate any direction.

You can do this using count() instead of row_number():
select transid
from (select x.*,
count(*) over (partition by transid, product, qty, price) as cnt
from x
) x
group by transid
having min(cnt) > 1;
However, that is sort of overkill, you could also use group by in the subquery:
select transid
from (select transid, product, qty, price, count(*) as cnt
from x
group by transid, product, qty, price
) x
group by transid
having min(cnt) > 1;

If I understand correctly, this should get you the answer you want:
CREATE TABLE dbo.SampleData (TransID int, Product char(1), Qty int, Price decimal(2,1));
INSERT INTO dbo.SampleData (TransID,
Product,
Qty,
Price)
VALUES (1,'a',2,1.0),
(1,'a',2,1.0),
(1,'a',2,1.0),
(1,'b',3,2.5),
(2,'a',1,1.0),
(2,'a',1,1.0),
(2,'b',2,2.0),
(2,'b',2,2.0),
(3,'a',5,2.0),
(3,'a',4,3.0);
WITH Counts AS (
SELECT TransID,Product,Qty,
COUNT(*) AS Dups
FROM dbo.SampleData
GROUP BY TransID, Product, Qty)
SELECT TransID
FROM Counts
GROUP BY TransID
HAVING MIN(Dups) >= 2;
DROP TABLE dbo.SampleData;

Use NOT EXISTS and check for IDs where there does not EXIST a row that doesn't have a duplicate.

select TransID
from table
except
select TransID
from table
group by TransID, Product, QTY, Price
having count(*) = 1

Try this query:
select transid,
product,
qty,
price
from (
select transid,
product,
qty,
price,
count(*) over (partition by transid, product) cntproduct,
count(*) over (partition by transid, qty) cntqty,
count(*) over (partition by transid, price) cntprice
from my_table
) a where cntprice > 1 and cntproduct > 1 and cntqty > 1

SELECT TransID FROM
(
SELECT COUNT(*) AS Count, TransID, Product, QTY, Price
FROM x
GROUP BY TransID, Product, QTY, Price
HAVING Count = 2
) AS Table1
NOT IN
SELECT TransID FROM
(
SELECT COUNT(*) AS Count, TransID, Product, QTY, Price
FROM x
GROUP BY TransID, Product, QTY, Price
HAVING Count = 1
) AS Table2
Then read the TransID. Done!

Related

Get NULL value when using an aggregate function

Here is the tables:
https://dbfiddle.uk/markdown?rdbms=sqlserver_2019&fiddle=effc94afe681b2dfdb3e2c02c2b005ea
I want to find the average Total Amount for last 3 values (I mean the last 3 OrderID) for each customer. If customer doesn't have 3 operation, result should be null.
Here is my answer (T-SQL):
SELECT s.CustomerID,avg(s.TotalAmount) as AverageofLast3_operation
FROM (SELECT OrderID, CustomerID, EventDate, TotalAmount,
ROW_NUMBER() over (partition by CustomerID ORDER BY OrderID asc) as Row_num
FROM CustomerOperation
)s
WHERE s.Row_num>3
GROUP BY CustomerID
And the result is:
CustomerID
AverageofLast3_operation
1
7833
2
1966
According to the question, I should also have a row like this:
CustomerID
AverageofLast3_operation
3
NULL
How can I achieve this with T-SQL?
You need conditional aggregation:
SELECT CustomerID,
AVG(CASE WHEN counter >= 3 THEN TotalAmount END) AS AverageofLast3_operation
from (
SELECT OrderID, CustomerID, EventDate, TotalAmount,
ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY OrderID DESC) AS Row_num,
COUNT(*) OVER (PARTITION BY CustomerID) counter
FROM CustomerOperation
) s
WHERE Row_num <= 3
GROUP BY CustomerID;
Or:
SELECT CustomerID,
CASE WHEN COUNT(*) = 3 THEN AVG(TotalAmount) END AS AverageofLast3_operation
from (
SELECT OrderID, CustomerID, EventDate, TotalAmount,
ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY OrderID DESC) AS Row_num
FROM CustomerOperation
) s
WHERE Row_num <= 3
GROUP BY CustomerID;
See the demo.
You can use a conditional average like so:
with t as (
select customerId,
case when
Row_Number() over(partition by customerid order by orderid desc) <=3 then totalamount
else 0 end TotalAmount,
Count(*) over (partition by customerid) cnt
from CustomerOperation
)
select customerId, Avg(case when cnt>=3 then totalamount end) as Average
from t
where totalAmount>0
group by CustomerId

How do I find the Sum and Max value per Unique ID in HIVE?

basically how do I turn
id name quantity
1 Jerry 1
1 Jerry 2
1 Nana 1
2 Max 4
2 Lenny 3
into
id name quantity
1 Jerry 3
2 Max 4
in HIVE?
I want to sum up and find the highest quantity for each unique ID
You can use window functions with aggregation:
select id, name, quantity
from (select id, name, sum(quantity) as quantity,
row_number() over (partition by id order by sum(quantity) desc) as seqnum
from t
group by id, name
) t
where seqnum = 1;
You can first calculate the sum of quantity per group, then rank them according to descending quantity, and finally filter the rows with rank = 1.
select
id, name, quantity
from (
select
*,
row_number() over (partition by id order by quantity desc) as rn
from (
select id, name, sum(quantity) as quantity
from mytable
group by id, name
)
) where rn = 1;
try like below
with cte as
(
select id,name,sum(quantity) as q
from table_name group by id,name
) select id,name,q from cte t1
where t1.q=( select max(q) from cte t2 where t1.id=t2.id)

List the most up-to-date product of each category,postqresql queries

user_id product_id category_id date_added date_update
1 2 1 2.3.2021 null
1 3 1 2.3.2020 2.4.2023
1 4 2 2.3.2020 null
1 5 2 2.3.2020 2.4.2023
2 5 2 2.3.2020 2.4.2023
2 4 1 2.3.2020 null
List the most up-to-date product of each category
You can use row_number()
select * from
(
select *,row_number() over(parition by userid,category_id order by date_update) as rn
from tablename
)A where rn=1
OR you can also use distinct on
select distinct on (user_id,category_id) *
FROM tablename
ORDER BY user_id,category_id, date_update
List the most up-to-date product of each category
You can use distinct on. Let me assume that if the update date is null, then you want the creation date:
select distinct on (category_id) t.*
from t
order by category_id, coalesce(date_update, date_added) desc;
If you wanted this per user/category combination, the logic would be:
select distinct on (user_id, category_id) t.*
from t
order by user_id, category_id, coalesce(date_update, date_added) desc;
Using Window function
select u_id,c_id, p_id, coalesce (date_update, date_added) as date ,
rank () over (partition by u_id, c_id order by coalesce (date_update, date_added) desc) as r
from inventory
) t where r = 1

SQL sum grouped by field with all rows

I have this table:
id sale_id price
-------------------
1 1 100
2 1 200
3 2 50
4 3 50
I want this result:
id sale_id price sum(price by sale_id)
------------------------------------------
1 1 100 300
2 1 200 300
3 2 50 50
4 3 50 50
I tried this:
SELECT id, sale_id, price,
(SELECT sum(price) FROM sale_lines GROUP BY sale_id)
FROM sale_lines
But get the error that subquery returns different number of rows.
How can I do it?
I want all the rows of sale_lines table selecting all fields and adding the sum(price) grouped by sale_id.
You can use window function :
sum(price) over (partition by sale_id) as sum
If you want sub-query then you need to correlate them :
SELECT sl.id, sl.sale_id, sl.price,
(SELECT sum(sll.price)
FROM sale_lines sll
WHERE sl.sale_id = sll.sale_id
)
FROM sale_lines sl;
Don't use GROUP BY in the sub-query, make it a co-related sub-query:
SELECT sl1.id, sl1.sale_id, sl1.price,
(SELECT sum(sl2.price) FROM sale_lines sl2 where sl2.sale_id = sl.sale_id) as total
FROM sale_lines sl1
In addition to other approaches, You can use CROSS APPLY and get the sum.
SELECT id, sale_id,price, Price_Sum
FROM YourTable AS ot
CROSS APPLY
(SELECT SUM(price) AS Price_Sum
FROM YourTable
WHERE sale_id = ot.sale_id);
SELECT t1.*,
total_price
FROM `sale_lines` AS t1
JOIN(SELECT Sum(price) AS total_price,
sale_id
FROM sale_lines
GROUP BY sale_id) AS t2
ON t1.sale_id = t2.sale_id

T-SQL: Select partitions which have more than 1 row

I've managed to use this query
SELECT
PartGrp,VendorPn, customer, sum(sales) as totalSales,
ROW_NUMBER() OVER (PARTITION BY partgrp, vendorpn ORDER BY SUM(sales) DESC) AS seqnum
FROM
BG_Invoice
GROUP BY
PartGrp, VendorPn, customer
ORDER BY
PartGrp, VendorPn, totalSales DESC
To get a result set like this. A list of sales records grouped by a group, a product ID (VendorPn), a customer, the customer's sales, and a sequence number which is partitioned by the group and the productID.
PartGrp VendorPn Customer totalSales seqnum
------------------------------------------------------------
AGS-AS 002A0002-252 10021013 19307.00 1
AGS-AS 002A0006-86 10021013 33092.00 1
AGS-AS 010-63078-8 10020987 10866.00 1
AGS-SQ B71040-39 10020997 7174.00 1
AGS-SQ B71040-39 10020998 2.00 2
AIRFRAME 0130-25 10017232 1971.00 1
AIRFRAME 0130-25 10000122 1243.00 2
AIRFRAME 0130-25 10008637 753.00 3
HARDWARE MS28775-261 10005623 214.00 1
M250 23066682 10013266 175.00 1
How can I filter the result set to only return rows which have more than 1 seqnum? I would like the result set to look like this
PartGrp VendorPn Customer totalSales seqnum
------------------------------------------------------------
AGS-SQ B71040-39 10020997 7174.00 1
AGS-SQ B71040-39 10020998 2.00 2
AIRFRAME 0130-25 10017232 1971.00 1
AIRFRAME 0130-25 10000122 1243.00 2
AIRFRAME 0130-25 10008637 753.00 3
Out of the first result set example, only rows with VendorPn "B71040-39" and "0130-25" had multiple customers purchase the product. All products which had only 1 customer were removed. Note that my desired result set isn't simply seqnum > 1, because i still need the first seqnum per partition.
I would change your query to be like this:
SELECT PartGrp,
VendorPn,
customer,
sum(sales) as totalSales,
ROW_NUMBER() OVER (PARTITION BY partgrp,vendorpn ORDER BY SUM(sales) DESC) as seqnum,
COUNT(1) OVER (PARTITION BY partgrp,vendorpn) as cnt
FROM BG_Invoice
GROUP BY PartGrp,VendorPn, customer
HAVING cnt > 1
ORDER BY PartGrp,VendorPn, totalSales desc
You can try something like:
SELECT PartGrp,VendorPn, customer, sum(sales) as totalSales,
ROW_NUMBER() OVER (PARTITION BY partgrp,vendorpn ORDER BY SUM(sales) DESC) as seqnum
FROM BG_Invoice
GROUP BY PartGrp,VendorPn, customer
HAVING seqnum <> '1'
ORDER BY PartGrp,VendorPn, totalSales desc
WITH CTE AS (
SELECT
PartGrp,VendorPn, customer, sum(sales) as totalSales,
ROW_NUMBER() OVER (PARTITION BY partgrp, vendorpn ORDER BY SUM(sales) DESC) AS seqnum
FROM
BG_Invoice
GROUP BY
PartGrp, VendorPn, customer)
SELECT DISTINCT
a.*
FROM
CTE a
JOIN
CTE b
ON a.PartGrp = b.PartGrp
AND a.VendorPn = b.VendorPn
WHERE
b.seqnum > 1
ORDER BY
a.PartGrp,
a.VendorPn,
a.totalSales DESC;