How to prepare month by month time series data in R from aggregated data containing part wise sales vs month? - dataframe

I am having a dataframe which consists of month wise sales data for many parts:
For eg
Partno Month Qty
Part 1 June 2019 20
Part 1 July 2019 25
Part 1 Sep 2019 30
Part 2 Mar 2019 45
Part 3 Aug 2019 40
Part 3 Nov 2019 21
I want to convert this data into a month by month time series, which makes it easier for time series forecasting, Once I make it into a ts object
Month Part1 Part 2 Part 3
Jan 0 0 0
Feb 0 0 0
Mar 0 45 0
Apr 0 0 0
May 0 0 0
June 20 0 0
July 25 0 0
Aug 0 0 0
Sept 0 30 0
Oct 0 0 20
Nov 0 0 21
Dec 0 0 0
I am quite baffled as to how this can be carried out in R. Any solutions for the same would be highly useful, as I plan build some forecasting models in R.
Looking forward to hearing from you all!

Assume the data DF shown reproducibly in the Note at the end.
First convert DF to zoo splitting it by the first column and converting the Month column to yearmon class. Then convert that to ts class, extend it to Jan to Dec, and set any NAs to 0. (If you don't need the 0 months at the beginning and end omit the yrs and window lines.)
library(zoo)
z <- read.zoo(DF, split = 1, index = 2, FUN = as.yearmon, format = "%b %Y")
tt <- as.ts(z)
yrs <- as.integer(range(time(tt))) # start and end years
tt <- window(tt, start = yrs[1], end = yrs[2] + 11/12, extend = TRUE)
tt[is.na(tt)] <- 0
tt
giving:
Part 1 Part 2 Part 3
Jan 2019 0 0 0
Feb 2019 0 0 0
Mar 2019 0 45 0
Apr 2019 0 20 0
May 2019 0 0 0
Jun 2019 20 0 0
Jul 2019 25 0 0
Aug 2019 0 0 0
Sep 2019 30 0 0
Oct 2019 0 0 20
Nov 2019 0 0 21
Dec 2019 0 0 0
Note
Lines <- "Partno, Month, Qty
Part 1, Jun 2019, 20
Part 1, Jul 2019, 25
Part 1, Sep 2019, 30
Part 2, Mar 2019, 45
Part 2, Apr 2019, 20
Part 3, Oct 2019, 20
Part 3, Nov 2019, 21"
DF <- read.csv(text = Lines, strip.white = TRUE)

Related

big query SQL - repeatedly/recursively change a row's column in the select statement based on the values in previous row

I have table like below
customer
date
end date
1
jan 1 2021
jan 30 2021
1
jan 2 2021
jan 31 2021
1
jan 3 2021
feb 1 2021
1
jan 27 2021
feb 26 2021
1
feb 3 2021
mar 5 2021
2
jan 2 2021
jan 31 2021
2
jan 10 2021
feb 9 2021
2
feb 10 2021
mar 12 2021
Now, I wanted to update the value in the 'end date' column of a row based on the values in the previous row 'end date' and the current row 'date'.
Say if the date in current row < end date of the previous row, I wanted to update the end date of the current row = (end date of the previous row).
I Wanted to do this repeated for all the rows (grouped by customer).
I want the output as below. Just need it in the select statement instead of a updating/inserting in a table.
Note - in below as the second row(end date) is updated with the value in the first row (jan 30 2021), now the third row value (jan 3 2021) is evaluated against the updated value in the second row (which is jan 30 2021) but not with the second row value before update (jan 31 2021).
customer
date
end date
1
jan 1 2021
jan 30 2021
1
jan 2 2021
jan 30 2021 [updated because current date < previous end date]
1
jan 3 2021
jan 30 2021[updated because current date < previous end date]
1
jan 27 2021
jan 30 2021 [updated because current date < previous end date]
1
feb 3 2021
mar 5 2021
2
jan 2 2021
jan 31 2021
2
jan 10 2021
jan 31 2021[updated because current date < previous end date]
2
feb 10 2021
mar 12 2021
I think I should go this way. I use the datasource twice just to get the way its needed to perform the operation without updating or inserting into the table.
input table:
1|2021-01-01|2021-01-30
1|2021-01-02|2021-01-31
1|2021-01-03|2021-02-01
1|2021-01-27|2021-02-26
1|2021-02-03|2021-03-05
2|2021-01-02|2021-01-31
2|2021-01-10|2021-02-09
2|2021-02-10|2021-03-12
code:
with num_raw_data as (
SELECT row_number() over(partition by customer)as num, customer,date_init,date_end
FROM `project-id.data-set.table`
), analyzed_data as(
select r.num,
r.customer,
r.date_init,
r.date_end,
case when date_init<(select date_end from num_raw_data where num=r.num-1 and customer=r.customer and EXTRACT(month FROM r.date_init)=EXTRACT(month FROM date_init)) then 1 else 0 end validation
from num_raw_data r
)
select customer,
date_init,
case when validation !=0 then (select MIN(date_end) from analyzed_data where validation=0 and customer=ad.customer and date_init<ad.date_end) else date_end end as date_end
from analyzed_data ad
order by customer,num
output:
1|2021-01-01|2021-01-30
1|2021-01-02|2021-01-30
1|2021-01-03|2021-01-30
1|2021-01-27|2021-01-30
1|2021-02-03|2021-03-05
2|2021-01-02|2021-01-31
2|2021-01-10|2021-01-31
2|2021-02-10|2021-03-12
Using column validation from analyzed_data to get to know where I should be looking for changes. I'm not sure if its fast (probably not) but it works for the scenario you bring in your question.

Pivot table times by a table

I have some code for a pivot table but I also want to times (x) it buy another table if it full into the criteria.
select *
from
(SELECT
year(dteOccupiedDate) as [year],
moveincharge as type ,
left(datename(month,dteOccupiedDate),3)as [month],
MoveInCharge
FROM dav.Gainshare where strTenancyType = 'LTO' and dteOccupiedDate between '2020-04-01' and '2021-03-31' and moveincharge is not null
) as s
PIVOT
(
count(moveincharge)
FOR [month] IN (jan, feb, mar, apr,
may, jun, jul, aug, sep, oct, nov, dec)
)AS pvt
order by year
this code shows
year type jan feb mar apr may jun jul aug sep oct nov dec
2020 Single 0 0 0 5 1 4 12 12 6 0 0 0
2020 Family 0 0 0 5 1 4 12 12 6 0 0 0
2020 Early-leave 0 0 0 5 1 4 12 12 6 0 0 0
2020 Re-house 0 0 0 5 1 4 12 12 6 0 0 0
they are 4 different moveincharge types when it fallin that type I want it X times by the
single (150)
family (200)
rehouse (130)
early leave (140)
I can also make these into a table if that make it better.
I want to see this
year type jan feb mar apr may jun jul aug sep oct nov dec
2020 Single 0 0 0 750 150 600 1800 1800 900 0 0 0
fo example Type Single for July is 12 x 150 because single is worth 150 each and there are 12 in that month
I much prefer conditional aggregation over the bespoke pivot syntax. This is easily accomplished with a join:
select year(gs.dteOccupiedDate) as [year],
sum(case when month(gs.dteOccupiedDate) = 1 then v.charge else 0 end) as jan,
sum(case when month(gs.dteOccupiedDate) = 2 then v.charge else 0 end) as feb,
. . .
from dav.Gainshare gs join
(values ('single', 150), ('family', 200), . . .
) v(moveincharge, charge)
on gs.movincharge = v.ovincharge
where gs.strTenancyType = 'LTO' and
gs.dteOccupiedDate between '2020-04-01' and '2021-03-31' and
gs.moveincharge is not null
group by year(gs.dteOccupiedDate);
Note: This will return two rows -- one for 2020 and one for 2021. That is what your query does. You might want to remove the group by and the year from the select.

Postgres query for annual sales report by rep. grouped by month

I would like to create an annual sales report table by sales rep, showing all the twelve month. The data I have is more or less like in this example:
id | rep | date | price
----------------------------
1 1 2017-01-01 20
2 1 2017-01-20 44
3 2 2017-02-18 13
4 2 2017-03-08 12
5 2 2017-04-01 88
6 2 2017-09-05 67
7 3 2017-01-31 10
8 3 2017-06-01 74
The result I need would be like this:
Rep Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
----------------------------------------------------
1 64 0 0 0 0 0 0 0 0 0 0 0
2 0 13 12 88 0 0 0 0 67 0 0 0
3 10 0 0 0 0 74 0 0 0 0 0 0
What would be the most efficient way to write this query?
One way:
select rep,
sum(case when extract('month' from date) = 1 then price else 0 end ) as Jan,
sum(case when extract('month' from date) = 2 then price else 0 end ) as Feb
-- another months here
from t
group by rep
One way is to use windowed functions with FILTER:
SELECT DISTINCT
"rep",
COALESCE(SUM(price) FILTER (WHERE extract('month' from "date") = 1) OVER(PARTITION BY "rep"),0) AS Jan,
COALESCE(SUM(price) FILTER (WHERE extract('month' from "date") = 2) OVER(PARTITION BY "rep"),0) AS Feb
--....
FROM ta;
Rextester Demo
Warning!
You probably want to partition by YEAR too to avoid summing JAN 2017 and JAN 2018.

Trying to pull the required rows from the single table with applying conditional statements on columns in sql server?

I have tried in n-number ways to solve this solution but unfortunately I got stuck in all the ways..
source table
id year jan feb mar apr may jun jul aug sep oct nov dec
1234 2014 05 06 12 15 16 17 18 19 20 21 22 23
1234 2013 05 06 12 15 16 17 18 19 20 21 22 23
Task: Assume that we are currently at March 2014, and we need 12 months back date ...(i.e., from Mar 2013 to Feb 2014, and the remaining values needs to be zero except year and id.)
Solution:
id year jan feb mar apr may jun jul aug sep oct nov dec
1234 2014 05 06 0 0 0 0 0 0 0 0 0 0
1234 2013 0 0 12 15 16 17 18 19 20 21 22 23
This needs a code solution for SQL Server 2008. I would be very happy if any body can solve this.
Note:
I got stuck to pull the column names dynamically.
You can try this:
select id, year, case when DATEDiff(month, getdate(), convert(datetime, year + '-01-01'))) < 12 then jan else 0,
DATEDiff(month, getdate(), convert(datetime, year + '-02-01'))) < 12 then fab else 0 ....

Best way to store aggregated values

We need to store aggregated values for different accounts which summarise various numbers on Month/Year basis. These numbers would be updated each time the data is updated (usually once or twice every 24 hours).
I'm expecting the data to be the results of PIVOT functions e.g.:
Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2011 0 0 0 0 0 0 95 33 34 24 36 52
Each account will need different aggregates e.g. "Count Of Customers", "Count Of Orders" and "Value Of Sales" and I'm not sure whether it would be best to add a key to the data or use separate tables e.g.:
Year Key Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2011 CntOrders 0 0 0 0 0 0 95 33 34 24 36 52
2011 CntCust 0 0 0 0 0 0 95 33 34 24 36 52
2011 ValOrders 0 0 0 0 0 0 95 33 34 24 36 52
Or
dbo.CountOfOrders
Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2011 0 0 0 0 0 0 95 33 34 24 36 52
dbo.ValueOfOrders
Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2011 0 0 0 0 0 0 95 33 34 24 36 52
I've read a number of posts suggesting both NoSQL and SQL Server so I'm not sure which way we should go or how to decide.
We can't justify a dedicated cube at the moment but I'm wondering if it would be better to store the values in a NoSQL database or whether we should stick with SQL Server?
I'll stick with SQL. However, if you are worried about the time to rebuild such PIVOT table, don't, because you don't have to necessarily build a table with unique "key".
Build it with key + process datetime and just append it to the main pivot. So during creation of the incrementals it will be bounded by your transaction timestamp (begin and end). There should be much bloat. If there is, you can collapse the process dates in a weekend job.
Set up a job to run stored procedures that insert data into tables.
Store the data like Account,Year,Month,Value
Use views of these tables for reporting multiple aggregates.
Definitely stick with SQL. There is no reason to add technical overhead for such a simple task.