T-SQL: Select partitions which have more than 1 row - sql

I've managed to use this query
SELECT
PartGrp,VendorPn, customer, sum(sales) as totalSales,
ROW_NUMBER() OVER (PARTITION BY partgrp, vendorpn ORDER BY SUM(sales) DESC) AS seqnum
FROM
BG_Invoice
GROUP BY
PartGrp, VendorPn, customer
ORDER BY
PartGrp, VendorPn, totalSales DESC
To get a result set like this. A list of sales records grouped by a group, a product ID (VendorPn), a customer, the customer's sales, and a sequence number which is partitioned by the group and the productID.
PartGrp VendorPn Customer totalSales seqnum
------------------------------------------------------------
AGS-AS 002A0002-252 10021013 19307.00 1
AGS-AS 002A0006-86 10021013 33092.00 1
AGS-AS 010-63078-8 10020987 10866.00 1
AGS-SQ B71040-39 10020997 7174.00 1
AGS-SQ B71040-39 10020998 2.00 2
AIRFRAME 0130-25 10017232 1971.00 1
AIRFRAME 0130-25 10000122 1243.00 2
AIRFRAME 0130-25 10008637 753.00 3
HARDWARE MS28775-261 10005623 214.00 1
M250 23066682 10013266 175.00 1
How can I filter the result set to only return rows which have more than 1 seqnum? I would like the result set to look like this
PartGrp VendorPn Customer totalSales seqnum
------------------------------------------------------------
AGS-SQ B71040-39 10020997 7174.00 1
AGS-SQ B71040-39 10020998 2.00 2
AIRFRAME 0130-25 10017232 1971.00 1
AIRFRAME 0130-25 10000122 1243.00 2
AIRFRAME 0130-25 10008637 753.00 3
Out of the first result set example, only rows with VendorPn "B71040-39" and "0130-25" had multiple customers purchase the product. All products which had only 1 customer were removed. Note that my desired result set isn't simply seqnum > 1, because i still need the first seqnum per partition.

I would change your query to be like this:
SELECT PartGrp,
VendorPn,
customer,
sum(sales) as totalSales,
ROW_NUMBER() OVER (PARTITION BY partgrp,vendorpn ORDER BY SUM(sales) DESC) as seqnum,
COUNT(1) OVER (PARTITION BY partgrp,vendorpn) as cnt
FROM BG_Invoice
GROUP BY PartGrp,VendorPn, customer
HAVING cnt > 1
ORDER BY PartGrp,VendorPn, totalSales desc

You can try something like:
SELECT PartGrp,VendorPn, customer, sum(sales) as totalSales,
ROW_NUMBER() OVER (PARTITION BY partgrp,vendorpn ORDER BY SUM(sales) DESC) as seqnum
FROM BG_Invoice
GROUP BY PartGrp,VendorPn, customer
HAVING seqnum <> '1'
ORDER BY PartGrp,VendorPn, totalSales desc

WITH CTE AS (
SELECT
PartGrp,VendorPn, customer, sum(sales) as totalSales,
ROW_NUMBER() OVER (PARTITION BY partgrp, vendorpn ORDER BY SUM(sales) DESC) AS seqnum
FROM
BG_Invoice
GROUP BY
PartGrp, VendorPn, customer)
SELECT DISTINCT
a.*
FROM
CTE a
JOIN
CTE b
ON a.PartGrp = b.PartGrp
AND a.VendorPn = b.VendorPn
WHERE
b.seqnum > 1
ORDER BY
a.PartGrp,
a.VendorPn,
a.totalSales DESC;

Related

Get NULL value when using an aggregate function

Here is the tables:
https://dbfiddle.uk/markdown?rdbms=sqlserver_2019&fiddle=effc94afe681b2dfdb3e2c02c2b005ea
I want to find the average Total Amount for last 3 values (I mean the last 3 OrderID) for each customer. If customer doesn't have 3 operation, result should be null.
Here is my answer (T-SQL):
SELECT s.CustomerID,avg(s.TotalAmount) as AverageofLast3_operation
FROM (SELECT OrderID, CustomerID, EventDate, TotalAmount,
ROW_NUMBER() over (partition by CustomerID ORDER BY OrderID asc) as Row_num
FROM CustomerOperation
)s
WHERE s.Row_num>3
GROUP BY CustomerID
And the result is:
CustomerID
AverageofLast3_operation
1
7833
2
1966
According to the question, I should also have a row like this:
CustomerID
AverageofLast3_operation
3
NULL
How can I achieve this with T-SQL?
You need conditional aggregation:
SELECT CustomerID,
AVG(CASE WHEN counter >= 3 THEN TotalAmount END) AS AverageofLast3_operation
from (
SELECT OrderID, CustomerID, EventDate, TotalAmount,
ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY OrderID DESC) AS Row_num,
COUNT(*) OVER (PARTITION BY CustomerID) counter
FROM CustomerOperation
) s
WHERE Row_num <= 3
GROUP BY CustomerID;
Or:
SELECT CustomerID,
CASE WHEN COUNT(*) = 3 THEN AVG(TotalAmount) END AS AverageofLast3_operation
from (
SELECT OrderID, CustomerID, EventDate, TotalAmount,
ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY OrderID DESC) AS Row_num
FROM CustomerOperation
) s
WHERE Row_num <= 3
GROUP BY CustomerID;
See the demo.
You can use a conditional average like so:
with t as (
select customerId,
case when
Row_Number() over(partition by customerid order by orderid desc) <=3 then totalamount
else 0 end TotalAmount,
Count(*) over (partition by customerid) cnt
from CustomerOperation
)
select customerId, Avg(case when cnt>=3 then totalamount end) as Average
from t
where totalAmount>0
group by CustomerId

How do I find the Sum and Max value per Unique ID in HIVE?

basically how do I turn
id name quantity
1 Jerry 1
1 Jerry 2
1 Nana 1
2 Max 4
2 Lenny 3
into
id name quantity
1 Jerry 3
2 Max 4
in HIVE?
I want to sum up and find the highest quantity for each unique ID
You can use window functions with aggregation:
select id, name, quantity
from (select id, name, sum(quantity) as quantity,
row_number() over (partition by id order by sum(quantity) desc) as seqnum
from t
group by id, name
) t
where seqnum = 1;
You can first calculate the sum of quantity per group, then rank them according to descending quantity, and finally filter the rows with rank = 1.
select
id, name, quantity
from (
select
*,
row_number() over (partition by id order by quantity desc) as rn
from (
select id, name, sum(quantity) as quantity
from mytable
group by id, name
)
) where rn = 1;
try like below
with cte as
(
select id,name,sum(quantity) as q
from table_name group by id,name
) select id,name,q from cte t1
where t1.q=( select max(q) from cte t2 where t1.id=t2.id)

Selecting rows that have row_number more than 1

I have a table as following (using bigquery):
id
year
month
sales
row_number
111
2020
11
1000
1
111
2020
12
2000
2
112
2020
11
3000
1
113
2020
11
1000
1
Is there a way in which I can select rows that have row numbers more than one?
For example, my desired output is:
id
year
month
sales
row_number
111
2020
11
1000
1
111
2020
12
2000
2
I don't want to just exclusively select rows with row_number = 2 but also row_number = 1 as well.
The original code block I used for the first table result is:
SELECT
id,
year,
month,
SUM(sales) AS sales,
ROW_NUMBER() OVER (PARTITIONY BY id ORDER BY id ASC) AS row_number
FROM
table
GROUP BY
id, year, month
You can use window functions:
select t.* except (cnt)
from (select t.*,
count(*) over (partition by id) as cnt
from t
) t
where cnt > 1;
As applied to your aggregation query:
SELECT iym.* EXCEPT (cnt)
FROM (SELECT id, year, month,
SUM(sales) as sales,
ROW_NUMBER() OVER (Partition by id ORDER BY id ASC) AS row_number
COUNT(*) OVER(Partition by id ORDER BY id ASC) AS cnt
FROM table
GROUP BY id, year, month
) iym
WHERE cnt > 1;
You can wrap your query as in below example
select * except(flag) from (
select *, countif(row_number > 1) over(partition by id) > 0 flag
from (YOUR_ORIGINAL_QUERY)
)
where flag
so it can look as
select * except(flag) from (
select *, countif(row_number > 1) over(partition by id) > 0 flag
from (
SELECT id,
year,
month,
SUM(sales) as sales,
ROW_NUMBER() OVER(Partition by id ORDER BY id ASC) AS row_number
FROM table
GROUP BY id, year, month
)
)
where flag
so when applied to sample data in your question - it will produce below output
Try this:
with tmp as (SELECT id,
year,
month,
SUM(sales) as sales,
ROW_NUMBER() OVER(Partition by id ORDER BY id ASC) AS row_number
FROM table
GROUP BY id, year, month)
select * from tmp a where exists ( select 1 from tmp b where a.id = b.id and b.row_number =2)
It's a so clearly exists statement SQL
This is what I use, it's similar to #ElapsedSoul answer but from my understanding for static list "IN" is better than using "EXISTS" but I'm not sure if the performance difference, if any, is significant:
Difference between EXISTS and IN in SQL?
WITH T1 AS
(
SELECT
id,
year,
month,
SUM(sales) as sales,
ROW_NUMBER() OVER(PARTITION BY id ORDER BY id ASC) AS ROW_NUM
FROM table
GROUP BY id, year, month
)
SELECT *
FROM T1
WHERE id IN (SELECT id FROM T1 WHERE ROW_NUM > 1);

redshift: how to find row_number after grouping and aggregating?

Suppose I have a table of customer purchases ("my_table") like this:
--------------------------------------
customerid | date_of_purchase | price
-----------|------------------|-------
1 | 2019-09-20 | 20.23
2 | 2019-09-21 | 1.99
1 | 2019-09-21 | 123.34
...
I'd like to be able to find the nth highest spending customer in this table (say n = 5). So I tried this:
with cte as (
select customerid, sum(price) as total_pay,
row_number() over (partition by customerid order by total_pay desc) as rn
from my_table group by customerid order by total_pay desc)
select * from cte where rn = 5;
But this gives me nonsense results. For some reason rn doesn't seem to be unique (for example there are a bunch of customers with rn = 1). I don't understand why. Isn't rn supposed to be just a row number?
Remove the partition by in the definition of row_number():
with cte as (
select customerid, sum(price) as total_pay,
row_number() over (order by total_pay desc) as rn
from my_table
group by customerid
)
select *
from cte
where rn = 5;
You are already aggregating by customerid, so each customer has only one row. So the value of rn will always be 1.

Is there a way to do something like SQL NOT top statement?

I'm trying to make a SQL statement that gives me the top X records and then all sums all the others. The first part is easy...
select top 3 Department, Sum(sales) as TotalSales
from Sales
group by Department
What would be nice is if I union a second query something like...
select NOT top 3 "Others" as Department, Sum(sales) as TotalSales
from Sales
group by Department
... for a result set that looks like,
Department TotalSales
----------- -----------
Mens Clothes 120.00
Jewelry 113.00
Shoes 98.00
Others 312.00
Is there a way to do an equivalent to a NOT operator on a TOP? (I know I can probably make a temp table of the top X and work with that, but I'd prefer a solution that was just a single sql statement.)
WITH q AS
(
SELECT ROW_NUMBER() OVER (ORDER BY SUM(sales) DESC) rn,
CASE
WHEN ROW_NUMBER() OVER (ORDER BY SUM(sales) DESC) <= 3 THEN
department
ELSE
'Others'
END AS dept,
SUM(sales) AS sales
FROM sales
GROUP BY
department
)
SELECT dept, SUM(sales)
FROM q
GROUP BY
dept
ORDER BY
MAX(rn)
WITH cte
As (SELECT Department,
Sum(sales) as TotalSales
from Sales
group by Department),
cte2
AS (SELECT *,
CASE
WHEN ROW_NUMBER() OVER (ORDER BY TotalSales DESC) <= 3 THEN
ROW_NUMBER() OVER (ORDER BY TotalSales DESC)
ELSE 4
END AS Grp
FROM cte)
SELECT MAX(CASE
WHEN Grp = 4 THEN 'Others'
ELSE Department
END) AS Department,
SUM(TotalSales) AS TotalSales
FROM cte2
GROUP BY Grp
ORDER BY Grp
You can use a union to sum all other departments. A common table expression makes this a little bit more readable:
; with Top3Sales as
(
select top 3 Department
, Sum(sales) as TotalSales
from Sales
group by
Department
order by
Sum(sales) desc
)
select Department
, TotalSales
from Top3Sales
union all
select 'Other'
, SUM(Sales)
from Sales
where Department not in (select Department from Top3Sales)
Example at data.stackexchange.com.
SELECT TOP 3 Department, SUM(Sales) AS TotalSales
FROM Sales
GROUP BY Department
UNION ALL
SELECT 'Others', SUM(s.Sales)
FROM Sales s
WHERE s.Department NOT IN
(SELECT Department
FROM (SELECT TOP 3 Department, SUM(Sales)
FROM Sales
GROUP BY Department) D)