Divide rows from yearly to monthly values pandas - pandas

I am trying to divide line items with a start and end date into multiple rows based on months.
Values should be calculated based on number of days in the specific months.
For instance, data of 1 line item:
id
StartDate
EndDate
Annual
abc
12/12/2018
01/12/2019
120,450
expected output:
id
Month
Year
Monthly volume
abc
12
2018
6,600
abc
1
2019
10,230
abc
2
2019
9,240
abc
3
2019
10,230
abc
4
2019
9,900
abc
5
2019
10,230
abc
6
2019
9,900
abc
7
2019
10,230
abc
8
2019
10,230
abc
9
2019
9,900
abc
10
2019
10,230
abc
11
2019
9,900

Few things for next time you ask.
This is a case where there are existing answers, so always try google
first to reduce duplication. Other post is referenced in code below.
You should also always include the code you already tried, SO doesn't
like to do your homework, but we will help you with it.
You should include a more readily reproduced dataframe. I shouldn't have to copy paste to build it, as in below code.
you are clearly doing something to convert the Annual total to a monthly volume but you do not explain this, so do not expect it to be done for you.
Lastly, this code doesn't convert to separate month and year columns, but once you have the date, this should be trivial for you to do (or to google how to do).
import pandas as pd
df = pd.DataFrame(
data = [['abc','12/12/2018','12/01/2019',120450]],
columns = ['id', 'startDate', 'EndDate', 'Annual']
)
df['startDate'] = pd.to_datetime(df['startDate'])
df['EndDate'] = pd.to_datetime(df['EndDate'])
# pd.bdate_range(start="2020/12/16", end="2020/12/26", freq="C", weekmask="Sat Sun")
# %%
df_start_end = df.melt(id_vars=['id', 'Annual'],value_name='date')
# credit to u/gen
# https://stackoverflow.com/questions/42151886/expanding-pandas-data-frame-with-date-range-in-columns
df = (
df_start_end.groupby('id')
.apply(lambda x: x.set_index('date')
.resample('M').pad())
.drop(columns=['id','variable'])
.reset_index()
)
print(df)

Related

Rolling Rows in pandas.DataFrame

I have a dataframe that looks like this:
year
month
valueCounts
2019
1
73.411285
2019
2
53.589128
2019
3
71.103842
2019
4
79.528084
I want valueCounts column's values to be rolled like:
year
month
valueCounts
2019
1
53.589128
2019
2
71.103842
2019
3
79.528084
2019
4
NaN
I can do this by dropping first index of dataframe and assigning last index to NaN but it doesn't look efficient. Is there any simpler method to do this?
Thanks.
Assuming your dataframe are already sorted.
Use shift:
df['valueCounts'] = df['valueCounts'].shift(-1)
print(df)
# Output
year month valueCounts
0 2019 1 53.589128
1 2019 2 71.103842
2 2019 3 79.528084
3 2019 4 NaN

np.where multi-conditional based on another column

I have two dataframes.
df_1:
Year
ID
Flag
2021
1
1
2020
1
0
2021
2
1
df_2:
Year
ID
2021
1
2020
2
I'm looking to add the flag from df_1 to df_2 based on id and year. I think I need to use an np.where statement but i'm having a hard time figuring it out. any ideas?
You can use pandas.merge() to combine df1 and df2 with outer ways.
df2["Flag"] = pd.NaT
df2["Flag"].update(df2.merge(df1, on=["Year", "ID"], how="outer")["Flag_y"])
print(df2)
Year ID Flag
0 2020 2 NaT
1 2021 1 1.0

How to join columns in Julia?

I have opened a dataframe in julia where i have 3 columns like this:
day month year
1 1 2011
2 4 2015
3 12 2018
how can I make a new column called date that goes:
day month year date
1 1 2011 1/1/2011
2 4 2015 2/4/2015
3 12 2018 3/12/2018
I was trying with this:
df[!,:date]= df.day.*"/".*df.month.*"/".*df.year
but it didn't work.
in R i would do:
df$date=paste(df$day, df$month, df$year, sep="/")
is there anything similar?
thanks in advance!
Julia has an inbuilt Date type in its standard library:
julia> using Dates
julia> df[!, :date] = Date.(df.year, df.month, df.day)
3-element Vector{Date}:
2011-01-01
2015-04-02
2018-12-03

Pandas Shift Column & Remove Row

I have a dataframe 'df1' that has 2 columns and i need to shift the 2nd column down a row and then remove the entire top row of the df1.
My data looks like this:
year ER12
0 2017 -2.05
1 2018 1.05
2 2019 -0.04
3 2020 -0.60
4 2021 -99.99
And, I need it to look like this:
year ER12
0 2018 -2.05
1 2019 1.05
2 2020 -0.04
3 2021 -0.60
We can try this:
df = df.assign(ER12=df.ER12.shift()).dropna().reset_index(drop=True)
print(df)
year ER12
0 2018 -2.05
1 2019 1.05
2 2020 -0.04
3 2021 -0.60
This works on your example:
import pandas as pd
df = pd.DataFrame({'year':[2017,2018,2019,2020,2021], 'ER12':[-2.05,1.05,-0.04,-0.6,-99.99]})
df['year'] = df['year'].shift(-1)
df = df.dropna()

Subsetting pandas dataframe based on two columnar values

I am trying to subset a large dataframe (5000+ rows and 15 columns) based on unique values from two columns (both are dtype = object). I want to exclude rows of data that meet the following criteria:
A column called 'Record' equals "MO" AND a column called 'Year' equals "2017" or "2018".
Here is an example of the dataframe:
df = pd.DataFrame({'A': [1001,2002,3003,4004,5005,6006,7007,8008,9009], 'Record' : ['MO','MO','I','I','MO','I','MO','I','I'], 'Year':[2017,2019,2018,2020,2018,2018,2020,2019,2017]})
print(df)
A Record Year
0 1001 MO 2017
1 2002 MO 2019
2 3003 I 2018
3 4004 I 2020
4 5005 MO 2018
5 6006 I 2018
6 7007 MO 2020
7 8008 I 2019
8 9009 I 2017
I would like any row with both "MO" and "2017", as well as both "MO" and "2018" taken out of the dataframe.
Example where the right rows (0 and 4 in dataframe above) are deleted:
df = pd.DataFrame({'A': [2002,3003,4004,6006,7007,8008,9009], 'Record' : ['MO','I','I','I','MO','I','I'], 'Year':[2019,2018,2020,2018,2020,2019,2017]})
print(df)
A Record Year
0 2002 MO 2019
1 3003 I 2018
2 4004 I 2020
3 6006 I 2018
4 7007 MO 2020
5 8008 I 2019
6 9009 I 2017
I have tried the following code, but it does not work (I tried at first for just one year):
df = df[(df['Record'] != "MO" & df['Year'] != "2017")]
I believe you're just missing some parenthesis.
df = df[(df['Record'] != "MO") & (df['Year'] != "2017")]
Edit:
After some clarification:
df = df[~((df['Record']=='MO')&
(df['Year']=='2017')|
(df['Year']=='2018'))]