How to get last quarterly and last half yearly average of balance for each month in hive? - sql

I have a table with column cust_id, year_, month_, monthly_txn, monthly_bal. I need
to calculate the previous three month and previous six month avg(monthly_txn) and variance(monthly_bal) for each month. I have a query which returns avg and variance for last three and six month only for last month not for each month. I am not good in analytical function in Hive.
SELECT cust_id, avg(monthly_txn)y,variance(monthly_bal)x, FROM (
SELECT cust_id, monthly_txn,monthly_bal,
row_number() over (partition by cust_id order by year_,month_ desc) r
from mytable) b WHERE r <= 3 GROUP BY cust_id
But I want something like below.
input:
cust_id year_ month_ monthly_txn monthly_bal
1 2018 1 456 8979289
1 2018 2 675 4567
1 2018 3 645 4890
1 2017 1 342 44522
1 2017 2 378 9898900
1 2017 2 456 234492358
1 2017 4 3535 789
1 2017 5 456 345
1 2017 6 598 334
expecting output:
suppose for txn the quaterly and half yearly txn will be like this same for variance also
cust_id year_ month_ monthly_txn monthly_bal q_avg_txn h_avg_txn
1 2018 1 456 8979289 avg(456,598,4561) avg(456,598,4561,3535,4536,378)
1 2018 2 675 4567 avg(675,456,598) avg(675,456,3535,4561,598,4536)
1 2018 3 645 4890 avg(645,675,645) avg(645,675,645,3535,4561,598)
1 2017 1 342 44522 avg(342) avg(342)
1 2017 2 378 9898900 avg(378,342) avg(378,342)
1 2017 3 4536 234492358 avg(4536,372,342) avg(4536,378,342)
1 2017 4 3535 789 avg(3535,4536,378) avg(3535,4536,378,342)
1 2017 5 4561 345 avg(4561,3535,4536) avg(4561,3535,4536,342,378)
1 2017 6 598 334 avg(598,4561,3535) avg(598,4561,3535,4536,342,378)

use unbounded preceding analytic functions (/* to get the quarterly and half years values) and then use the subquery to get results.
What is ROWS UNBOUNDED PRECEDING used for in Teradata?

If you have data for every month of interest (i.e., no gaps), then this should work:
select t.*,
avg(monthly_bal) over (partition by cust_id
order by year_, month_
rows between 2 preceding and current row
) as avg_3,
avg(monthly_bal) over (partition by cust_id
order by year_, month_
rows between 5 preceding and current row
) as avg_6,
variance(monthly_bal) over (partition by cust_id
order by year_, month_
rows between 2 preceding and current row
) as variance_3,
variance(monthly_bal) over (partition by cust_id
order by year_, month_
rows between 5 preceding and current row
) as variance_6
from mytable t;

Related

Sales for each completed month in current year and previous year

For each of the 12 months, I'm looking to create a field that sums the sales dollars at the account level for the most recent month and the 2nd most recent month based on the current date.
For example, given that today's date is 2022-10-28, 'MostRecentNovember' would sum up sales from November 2021. '2ndMostRecentNovember' would sum up sales from November 2020. Once the current date moves into November 2022, this query would adjust to pull MostRecentNovember sales from 2022 and 2ndMostRecentNovember sales from 2021.
Conversely, given that today's date is 2022-10-28 'MostRecentJune' would sum up sales from June 2022 and '2ndMostRecentJune' would sum up sales from June 2021.
In the end state, each account would have 24 fields: January - December for Most Recent and January - December for 2nd most recent
Below is my attempt at this code, this gets partially there, but it's not getting what I need. I've also tried with a CTE, but that didn't seem to do it either
SELECT NovemberMostRecent_Value =
sum(case when datepart(year,tran_date) = datepart(year, getdate())
AND DATEPART(month, tran_date) = 11 then value else 0 end)
NovemberSecondMostRecent_Value =
sum(case when datepart(year,tran_date) = datepart(year, getdate())-1
AND DATEPART(month, tran_date) = 11 then value else 0 end)
Here's a snippet of the source data table
account_no
tran_date
value
123
2021-11-22
500
123
2021-11-01
500
123
2020-11-20
1500
123
2022-06-03
5000
123
2021-06-04
2000
456
2020-11-03
525
456
2021-11-04
125
A table of desired Results
account_no
NovemberMostRecent
November2ndMostRecent
June MostRecent
June2ndMostRecent
123
1000
1500
5000
2000
456
125
525
0
0
We use dense_rank() by year desc (partitioned by month) and pivot.
select *
from
(
select account_no
,value
,concat(datename(month, tran_date), '_', dense_rank() over(partition by month(tran_date) order by year(tran_date) desc)) as month_rnk
from t
) t
pivot (sum(value) for month_rnk in(June_1, June_2, November_1, November_2)) p
account_no
June_1
June_2
November_1
November_2
123
5000
2000
1000
1500
456
null
null
125
525
Fiddle

How to get current and previous month's data in a table?

I have a table in oracle db that contains product_id (unique) , month and price data.
MONTH
PRODUCT_ID
CURRENT_PRICE
2
00011
14
2
00022
60
3
00011
10
3
00022
40
I want to write a SQL code in oracle to build up the view shown below;
MONTH
PRODUCT_ID
CURRENT_PRICE
PREVIOUS_PRICE
CHANGE_RATE
2
00011
14
NULL
NULL
2
00022
60
NULL
NULL
3
00011
10
14
40
3
00022
40
60
50
where current and previous price datas for each product is listed in a one row. How can i write it down? Thanks in advance.
Use lag():
select t.*,
lag(current_price) over (partition by product_id order by month) as prev_price,
(-100 + lag(current_price) over (partition by product_id order by month) * 100.0 / current_price) as change_rate
from t
order by month, product_id;
Here is a db<>fiddle.

Computing rolling average and standard deviation by dates

I have the below table where I will need to compute the rolling average and standard deviation based on the dates. I have listed below the tables and expected results. I am trying to compute the rolling average for an id based on date. rollAvgA is computed based on metricA. For example, for the first occurrence of id for a particular date the result should return zero as it does not have any preceding values. Please let me know how this can be accomplished?
Current Table :
Date id metricA
8/1/2019 100 2
8/2/2019 100 3
8/3/2019 100 2
8/1/2019 101 2
8/2/2019 101 3
8/3/2019 101 2
8/4/2019 101 2
Expected Table :
Date id metricA rollAvgA
8/1/2019 100 2 0
8/2/2019 100 3 2.5
8/3/2019 100 2 2.3
8/1/2019 101 2 0
8/2/2019 101 3 2.5
8/3/2019 101 2 2.3
8/4/2019 101 2 2.25
You seem to want a cumulative average. This is basically:
select t.*,
avg(metricA * 1.0) over (partition by id order by date) as rollingavg
from t;
The only caveat is that the first value is an average of one value. To handle this, use a case expression:
select t.*,
(case when row_number() over (partition by id order by date) > 1
then avg(metricA * 1.0) over (partition by id order by date)
else 0
end) as rollingavg
from t;

Filtering within an window function (over ... partition by)?

I am trying to use a sum() over (partition by) but filter within that summing. My use case is summing trailing twelve months up to a single month's entry for each product, so:
ITEM MONTH SALES
Item A 1/1/2011 2
Item A 2/1/2011 5
Item A 3/1/2011 3
Item A 4/1/2011 7
Item A 5/1/2011 12
Item A 6/1/2011 8
Item A 7/1/2011 9
Item A 8/1/2011 15
Item A 9/1/2011 6
Item A 10/1/2011 7
Item A 11/1/2011 12
Item A 12/1/2011 1
Item A 1/1/2012 3
Item A 2/1/2012 4
Item A 3/1/2012 5
Item A 4/1/2012 6
Item A 5/1/2012 4
Item A 6/1/2012 8
Item A 7/1/2012 9
Item A 8/1/2012 12
Item A 9/1/2012 14
Item A 10/1/2012 8
Item A 11/1/2012 12
Item A 12/1/2012 16
Would then return:
ITEM MONTH_BEGIN SALES TTM SALES
Item A 1/1/2012 3 87
Item A 2/1/2012 4 88
Item A 3/1/2012 5 87
Item A 4/1/2012 6 89
Where the TTM SALES for 1/1/12 is the sum of 1/1/11-12/1/11
The bellow query shows how I would do it with Oracle Analytic Functions:
SELECT
"ITEM",
TO_CHAR("MONTH", 'MM/DD/YYYY') AS "MONTH_BEGIN",
"SALES",
SUM("SALES") OVER (
PARTITION BY
"ITEM"
ORDER BY
"MONTH"
RANGE BETWEEN
INTERVAL '12' MONTH PRECEDING
AND
INTERVAL '1' MONTH PRECEDING
) AS "TTM_SALES"
FROM
"SALES"
ORDER BY
"MONTH";
Working SQLFiddle demo
This will compute the sum function over a window that starts 12 months before the month of the current row and ends 1 month before it.
I assumed that you do not need to filter anything in the where clause. If you do, be careful with it. Quoting the Oracle documentation:
Analytic functions are the last set of operations performed in a query
except for the final ORDER BY clause. All joins and all WHERE, GROUP
BY, and HAVING clauses are completed before the analytic functions are
processed.
So lets say that you want to display results only for the first quarter of 2012; if you try to do so by filtering in the where clause, it will affect the cumulative results of TTM_SALES as well (outputing null, 3, 7 and 12).
The bottom line here is: If you need to filter out rows within the window of the analytic function, move the analytic function to a subquery and filter in the outer query as per #peterm answer:
SELECT
"X"."ITEM",
TO_CHAR("X"."MONTH", 'MM/DD/YYYY') AS "MONTH_BEGIN",
"X"."SALES",
"X"."TTM_SALES"
FROM
(
SELECT
"ITEM",
"MONTH",
"SALES",
SUM("SALES") OVER (
PARTITION BY
"ITEM"
ORDER BY
"MONTH"
RANGE BETWEEN
INTERVAL '12' MONTH PRECEDING
AND
INTERVAL '1' MONTH PRECEDING
) AS "TTM_SALES"
FROM
"SALES"
ORDER BY
"MONTH"
) "X"
WHERE
EXTRACT(MONTH FROM "X"."MONTH") BETWEEN 1 AND 4
AND EXTRACT(YEAR FROM "X"."MONTH") = 2012;
If you're open to anything other than an analytic SUM() then here is a possible solution with a simple correlated subquery
SELECT s.item, s.month month_begin, s.sales,
(SELECT SUM(sales) FROM sales
WHERE month BETWEEN DATEADD(month, -12, s.month)
AND DATEADD(month, -1, s.month)) ttm_sales
FROM sales s
WHERE s.month BETWEEN '20120101' AND '20121201'
Sample output:
| ITEM | MONTH_BEGIN | SALES | TTM_SALES |
-----------------------------------------------------------------
| Item A | January, 01 2012 00:00:00+0000 | 3 | 87 |
| Item A | February, 01 2012 00:00:00+0000 | 4 | 88 |
| Item A | March, 01 2012 00:00:00+0000 | 5 | 87 |
| Item A | April, 01 2012 00:00:00+0000 | 6 | 89 |
...
Here is SQLFiddle demo

Need to print highest year and their highest quarter in SQL Server 2012

I have a requirement to print the corresponding highest year and highest quarter for a given column.
Input is in a table:
cityprogram year quarter
=========== ==== =======
Abc 1998 1
Abc 1999 4
Abc 1999 4
Abc 1998 3
xyz 1998 4
xyz 1998 1
xyz 2000 3
It should print
Abc 1999 4
xyz 2000 3
I tried many joins, max conditions, I seem to get quarter 4 and 4 for both of them :( thanks
Use a window function like ROW_NUMBER in a common-table-expression:
WITH CTE AS(
SELECT [cityprogram], [year], [quarter],
RN = ROW_NUMBER() OVER (
PARTITION BY [cityprogram]
ORDER BY [year] DESC, [quarter] DESC)
FROM dbo.TableName
)
SELECT [cityprogram], [year], [quarter]
FROM CTE
WHERE RN = 1
DEMO
CITYPROGRAM YEAR QUARTER
Abc 1999 4
xyz 2000 3
ROW_NUMBER returns only one row per group even if there are ties(cityprograms with the same highest year+quarter). If you then want to show all highest you can replace ROW_NUMBER with DENSE_RANK.