Good afternoon -
I have a table in Teradata that stores a rolling cumulative sum that resets every month. I would like to be able to calculate the incremental gain between each day of the month. Is this something that I can accomplish with olap functions or should it be handled in a recursive cte? Would love assistance thinking through this. Thanks!
example source
date
month
cum_value
2022-07-02
July 2022
25
2022-07-01
July 2022
5
2022-06-30
June 2022
100
2022-06-29
June 2022
70
2022-06-28
June 2022
65
2022-06-27
June 2022
50
example result
date
month
cum_value
incremental_value
2022-07-02
July 2022
25
20
2022-07-01
July 2022
5
5
2022-06-30
June 2022
100
30
2022-06-29
June 2022
70
5
2022-06-28
June 2022
65
15
2022-06-27
June 2022
50
..
Related
I have data that may at certain times of the year around the first of each year, that a day_of_year sequence involves changing the "year" column to the new year when day_of_year ==1. It is a trick that I have not been able to figure out and in some ways not sure how to start so any help here is much appreciated. My data looks like this:
Here is my df1 =
day_of_year year var_1
364 2017 17.71666667
364 2018 5.166666667
364 2019 2
364 2020 1.595833333
364 2021 3.75
364 2022 6.8875
365 2017 14.83333333
365 2018 2.758333333
365 2019 4.108333333
365 2020 5.766666667
365 2021 5.291666667
365 2022 10.58636364
1 2017 2.0125
1 2018 14.0125
1 2019 -0.504166667
1 2020 7.666666667
1 2021 5.520833333
1 2022 1.229166667
2 2017 1.7625
2 2018 15.10416667
2 2019 -0.391666667
2 2020 9.5
2 2021 7.645833333
2 2022 0.9125
And, after the re-formatting, I need it to look like the below sorted df with "n/a" for any missing or expected data in a year that might be missing data. thank you again,
final df:
day_of_year year var_1
364 2017 17.71666667
365 2017 14.83333333
1 2018 14.0125
2 2018 15.10416667
364 2018 5.166666667
365 2018 2.758333333
1 2019 -0.504166667
2 2019 -0.391666667
364 2019 2
365 2019 4.108333333
1 2020 7.666666667
2 2020 9.5
364 2020 1.595833333
365 2020 5.766666667
1 2021 5.520833333
2 2021 7.645833333
364 2021 3.75
365 2021 5.291666667
1 2022 1.229166667
2 2022 0.9125
364 2022 6.8875
365 2022 10.58636364
n/a n/a n/a
n/a n/a n/a
Why would you change the year based on the day? Just sort by the two columns:
df.sort_values(by=['year', 'day_of_year'])
Output:
day_of_year year var_1
12 1 2017 2.012500
18 2 2017 1.762500
0 364 2017 17.716667
6 365 2017 14.833333
13 1 2018 14.012500
19 2 2018 15.104167
1 364 2018 5.166667
7 365 2018 2.758333
14 1 2019 -0.504167
20 2 2019 -0.391667
2 364 2019 2.000000
8 365 2019 4.108333
15 1 2020 7.666667
21 2 2020 9.500000
3 364 2020 1.595833
9 365 2020 5.766667
16 1 2021 5.520833
22 2 2021 7.645833
4 364 2021 3.750000
10 365 2021 5.291667
17 1 2022 1.229167
23 2 2022 0.912500
5 364 2022 6.887500
11 365 2022 10.586364
If for some reason you really need to fix the year, use a conditional with mask:
(df.assign(year=df['year'].mask(df['day_of_year'].le(2), df['year'].add(1)))
.sort_values(by=['year', 'day_of_year'])
)
Or, if you want to update the years after a change from 365 to a lower day:
(df.assign(year=df['year'].add(df['day_of_year'].diff().lt(0).cumsum()))
.sort_values(by=['year', 'day_of_year'])
)
Output:
day_of_year year var_1
0 364 2017 17.716667
6 365 2017 14.833333
12 1 2018 2.012500
18 2 2018 1.762500
1 364 2018 5.166667
7 365 2018 2.758333
13 1 2019 14.012500
19 2 2019 15.104167
2 364 2019 2.000000
8 365 2019 4.108333
14 1 2020 -0.504167
20 2 2020 -0.391667
3 364 2020 1.595833
9 365 2020 5.766667
15 1 2021 7.666667
21 2 2021 9.500000
4 364 2021 3.750000
10 365 2021 5.291667
16 1 2022 5.520833
22 2 2022 7.645833
5 364 2022 6.887500
11 365 2022 10.586364
17 1 2023 1.229167
23 2 2023 0.912500
I would convert everything to date time first. Just run:
pd.to_datetime(df['day_of_year'].astype(str) + '-' + df['year'].astype(str),
format='%j-%Y')
I assign it to column ymd and sort, yielding the following:
>>> df.sort_values('ymd')
day_of_year year var_1 ymd
12 1 2017 2.012500 2017-01-01
18 2 2017 1.762500 2017-01-02
0 364 2017 17.716667 2017-12-30
6 365 2017 14.833333 2017-12-31
13 1 2018 14.012500 2018-01-01
19 2 2018 15.104167 2018-01-02
1 364 2018 5.166667 2018-12-30
7 365 2018 2.758333 2018-12-31
14 1 2019 -0.504167 2019-01-01
20 2 2019 -0.391667 2019-01-02
2 364 2019 2.000000 2019-12-30
8 365 2019 4.108333 2019-12-31
15 1 2020 7.666667 2020-01-01
21 2 2020 9.500000 2020-01-02
3 364 2020 1.595833 2020-12-29
9 365 2020 5.766667 2020-12-30
16 1 2021 5.520833 2021-01-01
22 2 2021 7.645833 2021-01-02
4 364 2021 3.750000 2021-12-30
10 365 2021 5.291667 2021-12-31
17 1 2022 1.229167 2022-01-01
23 2 2022 0.912500 2022-01-02
5 364 2022 6.887500 2022-12-30
11 365 2022 10.586364 2022-12-31
I am trying to merge two dataframes with different time delta. One represents the returns of an asset (df2) on a daily basis and the other one is the inflation rate (df1) which is published once a month but not in a regular inverval. I am trying to merge those two.
df1 =
First Release
Original Release Date
30 Jun 2010 10:01 1.4%
30 Jul 2010 10:00 1.7%
31 Aug 2010 10:00 1.6%
30 Sep 2010 10:00 1.8%
29 Oct 2010 10:02 1.9%
... ...
17 Mar 2022 11:00 5.9%
21 Apr 2022 10:00 7.4%
18 May 2022 10:00 7.4%
17 Jun 2022 10:00 8.1%
19 Jul 2022 10:00 8.6%
[145 rows x 1 columns]
df2 =
Date
2010-08-11 -0.001654
2010-08-12 -0.028538
2010-08-13 0.001072
2010-08-16 -0.007665
2010-08-17 0.002667
...
2022-01-25 0.029663
2022-01-26 0.026082
2022-01-27 -0.000115
2022-01-28 0.002425
2022-01-31 0.007184
Obviously inflation rate should be placed in the new column from the day after it is released until there is a new release. For example 30. June is the first anouncement and 30 Jul the second. So from 1. July to the 30. July should be 1.4 %. The result is published on the 30. but to avoid look-ahead-bias it is more appropriate to have it . Does someone have an idea or maybe encountered some similar problem ?
I have data in the table that contains the surcharge price of many carriers in many years. Each carrier will have 12 months.
CARRIER_ID YEAR MONTH RATE
DHL 2021 April 16.5
DHL 2021 August 18.5
DHL 2021 December 0
DHL 2021 February 14
DHL 2021 January 12.5
DHL 2021 July 17.75
DHL 2021 June 17
DHL 2021 March 15
DHL 2021 May 17
DHL 2021 November 0
DHL 2021 October 0
DHL 2021 September 0
FedEx 2021 April 16.5
FedEx 2021 August 17.5
FedEx 2021 December 0
FedEx 2021 February 14.5
FedEx 2021 January 13.5
FedEx 2021 July 17.5
FedEx 2021 June 17
FedEx 2021 March 16
FedEx 2021 May 16.5
FedEx 2021 November 0
FedEx 2021 October 0
FedEx 2021 September 0
And I want to make a query in SQL server to get data like this.
Please note that: The data need to group by the year(exp: 2021)
Month DHL FedEx
January 12.50% 13.50%
February14.00% 14.50%
March 15.00% 16.00%
April 16.50% 16.50%
May 17.00% 16.50%
June 17.00% 17.00%
July 17.75% 17.50%
August 18.50% 17.50%
September0 0
October 0 0
November0 0
December0 0
I did search in google but can not find the solution.
Pls give me how to do it.
Thank you so much.
If you do know your list of carriers, you can do it like this with standard sql
select
t.YEAR,
t.MONTH,
max(case when t.CARRIER_ID = 'DHL' then t.RATE else NULL) as DHL,
max(case when t.CARRIER_ID = 'FedEx' then t.RATE else NULL) as FedEx
from your_table t
group by t.YEAR, t.MONTH
order by t.YEAR, t.MONTH
YEAR and MONTH are usually reserved words, so it's not recommended to use them in your data.
I have to calculate sales based on WeekNo(Yearly) and Month.
Table1: Sales
ID SalDate Amount Region
1 2020-12-27 1000 USA
2 2020-12-28 1000 EU
3 2020-12-29 1000 AUS
4 2021-01-01 1000 USA
5 2021-01-02 1000 EU
6 2021-01-05 1000 AUS
7 2020-09-30 1000 EU
8 2020-10-01 1000 AUS
Select DateName(Month,SalDate)+' - '+Convert(Varchar(10),Year(SalDate)) Months,
'Week '+ Convert(Varchar(50), DATEPART(WEEK, SalDate)) As Weeks,
Sum(Amount) OrderValue
From [Sales]
Group By DateName(Month,SalDate)+' - '+Convert(Varchar(10),Year(SalDate)),
Convert(Varchar(50), DATEPART(WEEK, SalDate))
Months Weeks Total
January - 2021 Week 1 2000.00
January - 2021 Week 2 1000.00
December - 2020 Week 53 3000.00
October - 2020 Week 40 1000.00
September - 2020 Week 40 1000.00
But client want to merge Week1 Total with Week 53
And Week2 show as Week1.
And Week 40 Merge with Sep 2020.
I want result like below
Months Weeks Total
January - 2021 Week 1 1000.00
December - 2020 Week 53 5000.00 (2000+3000)
September - 2020 Week 40 2000.00 (1000+100)
Kindly help me to fix this problem.
I have data in two tables ORDERS and training.
Different kinds of training are provided to various customers. I would like to see what was the affect of these training on revenue for the customers involved. For achieving this, I would like to look at the revenue for the past 90 days and the future 90 days for each customer from the date of receiving the training. In other words, if a customer received a training on March 30 2014, I would like to look at the revenue from Jan 1 2014 till June 30 2014. I have come up with the following query which pretty much does the job:
select
o.custno,
sum(ISNULL(a.revenue,0)) as Revenue,
o.Date as TrainingDate,
DATEADD(mm, DATEDIFF(mm, 0, a.created), 0) as RevenueMonth
from ORDERS a,
(select distinct custno, max(Date) as Date from Training group by custno) o
where a.custnum = o.custno
and a.created between DATEADD(day, -90, o.Date) and DATEADD(day, 90, o.Date)
group by o.custno, o.Date, DATEADD(mm, DATEDIFF(mm, 0, a.created), 0)
order by o.custno
The sample output of this query looks something like this:
custno Revenue TrainingDate RevenueMonth
0000100 159.20 2014-06-02 00:00:00.000 2014-03-01 00:00:00.000
0000100 199.00 2014-06-02 00:00:00.000 2014-04-01 00:00:00.000
0000100 79.60 2014-06-02 00:00:00.000 2014-05-01 00:00:00.000
0000100 29.85 2014-06-02 00:00:00.000 2014-06-01 00:00:00.000
0000100 79.60 2014-06-02 00:00:00.000 2014-07-01 00:00:00.000
0000100 99.50 2014-06-02 00:00:00.000 2014-08-01 00:00:00.000
0000250 437.65 2013-02-26 00:00:00.000 2012-11-01 00:00:00.000
0000250 4181.65 2013-02-26 00:00:00.000 2012-12-01 00:00:00.000
0000250 4146.80 2013-02-26 00:00:00.000 2013-01-01 00:00:00.000
0000250 6211.93 2013-02-26 00:00:00.000 2013-02-01 00:00:00.000
0000250 2199.72 2013-02-26 00:00:00.000 2013-03-01 00:00:00.000
0000250 4452.65 2013-02-26 00:00:00.000 2013-04-01 00:00:00.000
Desired output example:
If the training was provided on March 15 2014, for customer number 100, I’d want revenue data in the following format:
CustNo Revenue TrainingDate RevenueMonth
100 <Some revenue figure> March 15 2014 Dec 15 2013 – Jan 14 2014 (Past month 1)
100 <Some revenue figure> March 15 2014 Jan 15 2014 – Feb 14 2014 (Past month 2)
100 <Some revenue figure> March 15 2014 Feb 15 2014 – Mar 14 2014 (Past month 3)
100 <Some revenue figure> March 15 2014 Mar 15 2014 – Apr 14 2014 (Future month 1)
100 <Some revenue figure> March 15 2014 Apr 15 2014 – May 14 2014 (Future month 2)
100 <Some revenue figure> March 15 2014 May 15 2014 – Jun 14 2014 (Future month 3)
Here, the RevenueMonth column doesn’t need to be in this format as long as it has the data relative to the training date. The ‘past’ and ‘future’ month references in the braces are only to explain the output, they need not be present in the output.
My query gets the data and groups the data by month. I would like the months to be grouped relative to the training date. For example - If the training was given on March 15, I would like the past month to be from Feb 15 till March 14, my query doesn't do that. I believe a little tweak in this query might just achieve what I'm after.
Any help with the query would be highly appreciated.
Something along these lines may do what you want:
select
t.custno,
sum(ISNULL(o.revenue,0)) as Revenue,
t.maxDate as TrainingDate,
((DATEDIFF(day, TrainingDate, o.created) + 89) / 30) - 3 as DeltaMonth
from ORDERS o
join (select custno, max([Date]) as maxDate from Training group by custno) t
on o.custnum = t.custno
where o.created
between DATEADD(day, -89, t.maxDate) and DATEADD(day, 90, t.maxDate)
group by t.custno, t.maxDate, DeltaMonth
order by t.custno
The general strategy is to compute a difference in months (or 30-day periods, really) from the training date, and group by that. This version uses from 89 days before to 90 days after the training, because if you run from -90 to +90 then you have one more day (the training day itself) than divides evenly into 30-day periods.
The query follows the general structure of the original, but there are several changes:
it computes DeltaMonth (from -3 to 2) as an index of 30-day periods relative to the training date, with the training date being the last day of DeltaMonth number -1.
it uses the DeltaMonth alias in the GROUP BY clause instead of repeating its formula. That provides better clarity, simplicity, and maintainability.
I changed the table aliases. I simply could not handle there being a table named "ORDERS" and an alias "o", with the latter not being associated with the former.
the query uses modern join syntax, for better clarity
the GROUP BY clause updated appropriately