Getting Average based on 2 conditions (columns and rows) - sql

My data is looking like this:
PRODUCT DEPT DATE PERCENTAGE
1 A JAN 2
1 B FEB 4
1 A MAR 1
1 B JAN 5
1 A FEB 3
1 B MAR 7
1 A JAN 3
1 B FEB 4
1 A MAR 2
1 B JAN 8
1 A FEB 9
1 B MAR 6
... ... ... ...
With thousands of different products and dozens of departments.
The calculation I have to go through is:
1 - Sum the percentages as follow: by product, dept and date (so Product 1 / DEPT A / JAN => SUM(PERCENTAGE). For each PRODUCT, DEPT and DATE.
2 - When I have my sums, get the average of the 3 months for each product and dept (product 1 dept A: JAN / FEB / MAR, and so on)
3 - Get the max average (for each product, which dept has the highest average).
I have something which works but it's so long I am sure I can learn and make something better:
Select
Verylong_q.TFC,
Round(MAX(verylong_q.average),2) AS HIGHEST_AVERAGE
FROM
(
SELECT
Long_Q.TFC,
Long_Q.DEPT,
Long_Q.Percentage1,
Long_Q.Percentage2,
Long_Q.Percentage3,
((Percentage1 + Percentage2 + Percentage3)/3) AS Average
FROM
(
SELECT
t_Month1.TFC,
t_Month1.DEPT,
t_Month1.Percentage1,
t_Month2.Percentage2,
t_Month3.Percentage3
From
(
Select
pos.TFC,
mv.Dept AS Sector,
sum(pos.percentage) AS Percentage3
FROM
TBO_POS pos,
TBL_MV mv
Where
pos.IV_ID = mv.IV_ID
and Date = […]
and TFC in […]
group by pos.TFC, mv.Dept, pos.Date
order by 1 DESC ) t_Month1
LEFT JOIN
(
Select
pos.TFC,
mv.Dept AS Sector,
sum(pos.percentage) AS Percentage2
FROM
TBO_POS pos,
TBL_MV mv
Where
pos.IV_ID = mv.IV_ID
and Date = […]
and TFC in […]
group by pos.TFC, mv.Dept, pos.Date
order by 1 DESC ) t_Month2
On t_month1.DEPT = t_month2.DEPT and t_month1.TFC = t_month2.TFC
LEFT JOIN
(
Select
pos.TFC,
mv.Dept AS Sector,
sum(pos.percentage) AS Percentage3
FROM
TBO_POS pos,
TBL_MV mv
Where
pos.IV_ID = mv.IV_ID
and Date = […]
and TFC in […]
group by pos.TFC, mv.Dept, pos.Date
order by 1 DESC ) t_Month3
on t_month1.DEPT = t_month3.DEPT and t_month1.TFC = t_month3.TFC
) Long_Q
) VeryLong_Q
Group by verylong_q.TFC
How could I do this in a better way? Thanks!

Isn't that simply:
Sum the percentages by product, dept and date in the innermost subquery
Get the average of the months for each product and dept in the next subquery
Get the max average for each product in the main query.
Query:
select product, max(avg_sum_percentage)
from
(
select product, dept, avg(sum_percentage) as avg_sum_percentage
from
(
select product, dept, date, sum(percentage) as sum_percentage
from mytable
group by product, dept, date
) per_product_dept_date
group by product, dept
) per_product_dept
group by product;

From what you describer lag() seems like the appropriate method, along with aggregation and selection of the best:
select *
from (select product, dept, (sump_1 + sump_2 + sump_3) /3 as avg_max,
row_number() over (partition by product order by (sump_1 + sump_2 + sump_3) /3 desc) as seqnum
from (select product, dept, date, sum(percentage) as sump,
lag(sum(percentage)) over (partition by product, dept order by date) as sump_1,
lag(sum(percentage, 2)) over (partition by product, dept order by date) as sump_2
from TBO_POS pos join
TBL_MV mv
on pos.IV_ID = mv.IV_ID
where Date = […] and TFC in […]
group by product, dept, date
) t
) t
where seqnum = 1;
This solution follows the description of the problem. It produces one row for each month and product. This version does not take into account missing values and other issues. I think this is the logic you want, but without expected results the question might be ambiguous.

Related

What to use in place of union in above query i wrote or more optimize query then my given query without union and union all

I am counting the birthdays , sales , order in all 12 months from customers table in SQL server like these
In Customers table birth_date ,sale_date, order_date are columns of the table
select 1 as ranking,'Birthdays' as Type,[MONTH],TOTAL
from ( select DATENAME(month, birth_date) AS [MONTH],count(*) TOTAL
from customers
group by DATENAME(month, birth_date)
)x
union
select 2 as ranking,'sales' as Type,[MONTH],TOTAL
from ( select DATENAME(month, sale_date) AS [MONTH],count(*) TOTAL
from customers
group by DATENAME(month, sale_date)
)x
union
select 3 as ranking,'Orders' as Type,[MONTH],TOTAL
from ( select DATENAME(month, order_date) AS [MONTH],count(*) TOTAL
from customers
group by DATENAME(month, order_date)
)x
And the output is like these(just dummy data)
ranking
Type
MONTH
TOTAL
1
Birthdays
January
12
1
Birthdays
April
6
1
Birthdays
May
10
2
Sales
Febrary
8
2
Sales
April
14
2
Sales
May
10
3
Orders
June
4
3
Orders
July
3
3
Orders
October
6
3
Orders
December
17
I want to find count of these all these three types without using UNION and UNION ALL, means I want these data by single query statement (or more optimize version of these query)
Another approach is to create a CTE with all available ranking values ​​and use CROSS APPLY for it, as shown below.
WITH ranks(ranking) AS (
SELECT * FROM (VALUES (1), (2), (3)) v(r)
)
SELECT
r.ranking,
CASE WHEN r.ranking = 1 THEN 'Birthdays'
WHEN r.ranking = 2 THEN 'Sales'
WHEN r.ranking = 3 THEN 'Orders'
END AS Type,
DATENAME(month, CASE WHEN r.ranking = 1 THEN c.birth_date
WHEN r.ranking = 2 THEN c.sale_date
WHEN r.ranking = 3 THEN c.order_date
END) AS MONTH,
COUNT(*) AS TOTAL
FROM customers c
CROSS APPLY ranks r
GROUP BY r.ranking,
DATENAME(month, CASE WHEN r.ranking = 1 THEN c.birth_date
WHEN r.ranking = 2 THEN c.sale_date
WHEN r.ranking = 3 THEN c.order_date
END)
ORDER BY r.ranking, MONTH

SQL count distinct over partition by cumulatively

I am using AWS Athena (Presto based) and I have this table named base:
id
category
year
month
1
a
2021
6
1
b
2022
8
1
a
2022
11
2
a
2022
1
2
a
2022
4
2
b
2022
6
I would like to craft a query that counts the distinct values of the categories per id, cumulatively per month and year, but retaining the original columns:
id
category
year
month
sumC
1
a
2021
6
1
1
b
2022
8
2
1
a
2022
11
2
2
a
2022
1
1
2
a
2022
4
1
2
b
2022
6
2
I've tried doing the following query with no success:
SELECT id,
category,
year,
month,
COUNT(category) OVER (PARTITION BY id, ORDER BY year, month) AS sumC FROM base;
This results in 1, 2, 3, 1, 2, 3 which is not what I'm looking for. I'd rather need something like a COUNT(DISTINCT) inside a window function, though it's not supported as a construct.
I also tried the DENSE_RANK trick:
DENSE_RANK() OVER (PARTITION BY id ORDER BY category)
+ DENSE_RANK() OVER (PARTITION BY id ORDER BY category)
- 1 as sumC
Though, because there is no ordering between year and month, it just results in 2, 2, 2, 2, 2, 2.
Any help is appreciated!
One option is
creating a new column that will contain when each "category" is seen for the first time (partitioning on "id", "category" and ordering on "year", "month")
computing a running sum over this column, with the same partition
WITH cte AS (
SELECT *,
CASE WHEN ROW_NUMBER() OVER(
PARTITION BY id, category
ORDER BY year, month) = 1
THEN 1
ELSE 0
END AS rn1
FROM base
ORDER BY id,
year_,
month_
)
SELECT id,
category,
year_,
month_,
SUM(rn1) OVER(
PARTITION BY id
ORDER BY year, month
) AS sumC
FROM cte

sql get balance at end of year

I have a transactions table for a single year with the amount indicating the debit transaction if the value is negative or credit transaction values are positive.
Now in a given month if the number of debit records is less than 3 or if the sum of debits for a month is less than 100 then I want to charge a fee of 5.
I want to build and sql query for this in postgre:
select sum(amount), count(1), date_part('month', date) as month from transactions where amount < 0 group by month;
I am able get records per month level, I am stuck on how to proceed further and get the result.
You can start by generating the series of month with generate_series(). Then join that with an aggregate query on transactions, and finally implement the business logic in the outer query:
select sum(t.balance)
- 5 * count(*) filter(where coalesce(t.cnt, 0) < 3 or coalesce(t.debit, 0) < 100) as balance
from generate_series(date '2020-01-01', date '2020-12-01', '1 month') as d(dt)
left join (
select date_trunc('month', date) as dt, count(*) cnt, sum(amount) as balance,
sum(-amount) filter(where amount < 0) as debit
from transactions t
group by date_trunc('month', date)
) t on t.dt = d.dt
Demo on DB Fiddle:
| balance |
| ------: |
| 2746 |
How about this approach?
SELECT
SUM(
CASE
WHEN usage.amount_s > 100
OR usage.event_c > 3
THEN 0
ELSE 5
END
) AS YEAR_FEE
FROM (SELECT 1 AS month UNION
SELECT 2 UNION
SELECT 3 UNION
SELECT 4 UNION
SELECT 5 UNION
SELECT 6 UNION
SELECT 7 UNION
SELECT 8 UNION
SELECT 9 UNION
SELECT 10 UNION
SELECT 11 UNION
SELECT 12
) months
LEFT OUTER JOIN
(
SELECT
sum(amount) AS amount_s,
count(1) event_c,
date_part('month', date) AS month
FROM transactions
WHERE amount < 0
GROUP BY month
) usage ON months.month = usage.month;
First you must use a resultset that returns all the months (1-12) and join it with a LEFT join to your table.
Then aggregate to get the the sum of each month's amount and with conditional aggregation subtract 5 from the months that meet your conditions.
Finally use SUM() window function to sum the result of each month:
SELECT DISTINCT SUM(
COALESCE(SUM(t.Amount), 0) -
CASE
WHEN SUM((t.Amount < 0)::int) < 3
OR SUM(CASE WHEN t.Amount < 0 THEN -t.Amount ELSE 0 END) < 100 THEN 5
ELSE 0
END
) OVER () total
FROM generate_series(1, 12, 1) m(month) LEFT JOIN transactions t
ON m.month = date_part('month', t.date) AND date_part('year', t.date) = 2020
GROUP BY m.month
See the demo.
Results:
> | total |
> | ----: |
> | 2746 |
I think you can use the hanving clause.
Select ( sum(a.total) - (12- count(b.cnt ))*5 ) as result From
(Select sum(amount) as total , 'A' as name from transactions ) as a left join
(Select count(amount) as cnt , 'A' as name
From transactions
where amount <0
group by month(date)
having not(count(amount) <3 or sum(amount) >-100) ) as b
on a.name = b.name
select
sum(amount) - 5*(12-(
select count(*)
from(select month, count(amount),sum(amount)
from transactions
where amount<0
group by month
having Count(amount)>=3 And Sum(amount)<=-100))) as balance
from transactions ;

Display max year and its max month with their corresponding value in oracle?

Year Month Value
2015 1 300
2015 2 400
2010 4 100
2016 7 200
2016 8 300
2017 2 100
2017 3 200
2017 6 400
You might try the following:
SELECT MAX(year), MAX(month)
, MAX(value) KEEP ( DENSE_RANK FIRST ORDER BY year DESC, month DESC )
FROM mytable;
If you want the max month per year along with the corresponding value, then you can do this:
SELECT year, MAX(month)
, MAX(value) KEEP ( DENSE_RANK FIRST ORDER BY month DESC )
FROM mytable
GROUP BY year;
Hope this helps.
You could use:
SELECT *
FROM (SELECT *
FROM tab t
ORDER BY Year DESC, Month DESC) s
WHERE rownum = 1;
Select * from table_name where month =
(select max(month) from table_name where year =
(select max(year) from table_name));
This might be the answer you are looking for, I have used nested queries to reach out to the desired result

Finding the interval between dates in SQL Server

I have a table including more than 5 million rows of sales transactions. I would like to find sum of date intervals between each customer three recent purchases.
Suppose my table looks like this :
CustomerID ProductID ServiceStartDate ServiceExpiryDate
A X1 2010-01-01 2010-06-01
A X2 2010-08-12 2010-12-30
B X4 2011-10-01 2012-01-15
B X3 2012-04-01 2012-06-01
B X7 2012-08-01 2013-10-01
A X5 2013-01-01 2015-06-01
The Result that I'm looking for may looks like this :
CustomerID IntervalDays
A 802
B 135
I know the query need to first retrieve 3 resent transactions of each customer (based on ServiceStartDate) and then calculate the interval between startDate and ExpiryDate of his/her transactions.
You want to calculate the difference between the previous row's ServiceExpiryDate and the current row's ServiceStartDate based on descending dates and then sum up the last two differences:
with cte as
(
select tab.*,
row_number()
over (partition by customerId
order by ServiceStartDate desc
, ServiceExpiryDate desc -- don't know if this 2nd column is necessary
) as rn
from tab
)
select t2.customerId,
sum(datediff(day, prevEnd, ServiceStartDate)) as Intervaldays
,count(*) as purchases
from cte as t2 left join cte as t1
on t1.customerId = t2.customerId
and t1.rn = t2.rn+1 -- previous and current row
where t2.rn <= 3 -- last three rows
group by t2.customerId;
Same result using LEAD:
with cte as
(
select tab.*,
row_number()
over (partition by customerId
order by ServiceStartDate desc) as rn
,lead(ServiceExpiryDate)
over (partition by customerId
order by ServiceStartDate desc
) as prevEnd
from tab
)
select customerId,
sum(datediff(day, prevEnd, ServiceStartDate)) as Intervaldays
,count(*) as purchases
from cte
where rn <= 3
group by customerId;
Both will not return the expected result unless you subtract purchases (or max(rn)) from Intervaldays. But as you only sum two differences this seems to be not correct for me either...
Additional logic must be applied based on your rules regarding:
customer has less than 3 purchases
overlapping intervals
Assuming there are no overlaps, I think you want this:
select customerId,
sum(datediff(day, ServiceStartDate, ServieEndDate) as Intervaldays
from (select t.*, row_number() over (partition by customerId
order by ServiceStartDate desc) as seqnum
from table t
) t
where seqnum <= 3
group by customerId;
Try this:
SELECT dt.CustomerID,
SUM(DATEDIFF(DAY, dt.PrevExpiry, dt.ServiceStartDate)) As IntervalDays
FROM (
SELECT *
, ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY ServiceStartDate DESC) AS rn
, (SELECT Max(ti.ServiceExpiryDate)
FROM yourTable ti
WHERE t.CustomerID = ti.CustomerID
AND ti.ServiceStartDate < t.ServiceStartDate) As PrevExpiry
FROM yourTable t )dt
GROUP BY dt.CustomerID
Result will be:
CustomerId | IntervalDays
-----------+--------------
A | 805
B | 138