Last 3 months average next to current month value in hive - hive

I have a table which has the monthly sales values for each of the items. I need last 3 months average sales value next to the current month sales for each item.
Need to perform this operation in hive.
The sample input table looks like below
Item_ID Sales Month
A 4295 Dec-2018
A 245 Nov-2018
A 1337 Oct-2018
A 3290 Sep-2018
A 2000 Aug-2018
B 856 Dec-2018
B 1694 Nov-2018
B 4286 Oct-2018
B 2780 Sep-2018
B 3100 Aug-2018
The result table should look like this
Item_ID Sales_Current_Month Month Sales_Last_3_months_average
A 4295 Dec-2018 1624
A 245 Nov-2018 2209
B 856 Dec-2018 2920
B 1694 Nov-2018 3388.67

Assuming there is no missing months data, you can use avg window function to do this.
select t.*
,avg(sales) over(partition by item_id order by month rows between 3 preceding and 1 preceding) as avg_sales_prev_3_months
from tbl t
If month column is in a format different from yyyyMM, use an appropriate conversion so the ordering works as expected.

Related

How to sum with specific range of records

I have table
tax number yearmonth(int)
100 45 202105
2 45 202104
35 45 202102
47 45 202012
58 45 202005
I try to aggregate sum for every number by last 12 month
For instance 202105 - I need sum month between (202012 - 202001)
Main problem -> not every number has all 12 months
I tried over clause but it sums all 12 preceding records. It does not take into account missing year records.
case when yearmonth-lag(yearmonth,1) OVER ( order by number, yearmonth) <> (0) then
sum([tax]) OVER (
PARTITION BY [number]
ORDER BY yearmonth
Rows BETWEEN 11 PRECEDING AND CURRENT ROW ) end

combine two rows with 2 months into one row of one month, containing null values into one

I would like to have a dataframe where 1 row only contains one month of data.
month cust_id closed_deals cum_closed_deals checkout cum_checkout
2019-10-01 1 15 15 null null
2019-10-01 1 null 15 210 210
2019-11-01 1 27 42 null 210
2019-11-01 1 null 42 369 579
Expected result:
month cust_id closed_deals cum_closed_deals checkout cum_checkout
2019-10-01 1 15 15 210 210
2019-11-01 1 27 42 369 579
At first, I thought a normal groupby will work, but as I try to group by only by "month" and "cust_id", I got an error saying that closed_deals and checkout also need to be in the groupby.
You may simply aggregate by the (first of the) month and cust_id and take the max of all other columns:
SELECT
month,
cust_id,
MAX(closed_deals) AS closed_deals,
MAX(cum_closed_deals) AS cum_closed_deals,
MAX(checkout) AS checkout,
MAX(cum_checkout) AS cum_checkout
FROM yourTable
GROUP BY
month,
cust_id;

Total revenue of an account for the preceding 12 months - Redshift SQL

So my doubt is in sql. I am looking to find the total revenue of a parent account for the last 12 months.
The data will look something like this
revenue
name
month
year
10000
abc
201001
2010-01-12
10000
abc
201402
2014-02-14
2000
abc
201404
2014-04-12
3000
abc
201406
2014-06-30
30000
def
201301
2013-01-14
6000
def
201304
2013-04-12
9000
def
201407
2013-07-19
And the output should be something like this
revenue
name
month
year
Running Sum
10000
abc
201001
2010-01-12
10000
10000
abc
201402
2014-02-14
10000
2000
abc
201404
2014-04-12
12000
3000
abc
201406
2014-06-30
15000
30000
def
201301
2013-01-14
30000
6000
def
201304
2013-04-12
36000
9000
def
201407
2013-07-19
45000
I have tried using using windowing function something like this and the logic that I need
select revenue, name, date, month,
sum(revenue) over (partition by name order by month rows between '12 months' preceding AND CURRENT ROW )
from table
but the above command gives a syntax error
Redshift does not support intervals in the window frame specification.
So, convert to a number. A convenient one in this case is the number of months since some point in time:
select revenue, name, date, month,
sum(revenue) over (partition by name
order by datediff(month, '1900-01-01', month)
range between 12 preceding and current row
)
from table;
I will note that your logic adds up data from 13 months, not 12. I suspect you want between 11 preceding and current row.
You can use rows between if you have data for all months:
sum(revenue) over (partition by name
order by datediff(month, '1900-01-01', month)
rows between 12 preceding and current row
)

Creating a Separate Column for Prior Week Values (PostgreSQL)

How would I go about having a separate column that shows the prior week's value? For example, if Product A's value for 01/03/2021 was 100, I would like 01/10/2021 to show its date value as well as the 01/03/2021 value in a separate column.
Desired table below (for simplicity sake I added random numbers for the prior week values for 01/03 and 01/04):
Date
Product
Value
Prior Week Value
01/03/2021
Product A
100
50
01/04/2021
Product A
200
55
01/10/2021
Product A
600
100
01/11/2021
Product A
700
200
01/03/2021
Product B
250
40
01/04/2021
Product B
550
45
01/10/2021
Product B
460
250
01/11/2021
Product B
100
550
If you want exactly 7 days before, you can use window functions with a range specification:
select t.*,
max(value) over (partition by product
order by date
range between '7 day' preceding and '7 day' preceding
) as value_prev_week
from t;

Multiple day on day changes based on dates in data as not continuous

See table A. There are number of sales per date. The dates are not continuous.
I want table B where it gives the sales moves per the previous date in the dataset.
I am trying to do it in SQL but get stuck. I can do an individual day on day difference by entering the date but I want one where I don't need to enter the dates manually
A
Date Sales
01/01/2019 100
05/01/2019 200
12/01/2019 50
25/01/2019 25
31/01/2019 200
B
Date DOD Move
01/01/2019 -
05/01/2019 +100
12/01/2019 -150
25/01/2019 -25
31/01/2019 +175
Use lag():
select t.*,
(sales - lag(sales) over (order by date)) as dod_move
from t;