Get column names as index in multi index data frame - dataframe

I have this data frame:
Abacate Abóbora (inclui butternut) Alface Alho
Region years
PT 1986 NaN NaN NaN NaN
1987 NaN NaN NaN NaN
1988 NaN NaN NaN NaN
1989 NaN NaN NaN NaN
1990 NaN NaN NaN NaN
... ... ... ... ... ...
3 2017 NaN NaN NaN NaN
2018 NaN NaN NaN NaN
2019 50.0 NaN NaN NaN
2020 50.0 NaN NaN NaN
2021 50.0 NaN NaN NaN
324 rows × 95 columns
How can I do this data frame with 3 indexes ,i.e., how can I pass all the column names as the third index, like this:
Region years Products Productivity
PT 1986 Abacate NaN
Abóbora (inclui butternut) NaN
Alface NaN
Alho NaN
1987 Abacate NaN
Abóbora (inclui butternut) NaN
Alface NaN
Alho NaN
1988 Abacate NaN
Abóbora (inclui butternut) NaN
Alface NaN
Alho NaN
1989 Abacate NaN
Abóbora (inclui butternut) NaN
Alface NaN
Alho NaN
1990 Abacate NaN
Abóbora (inclui butternut) NaN
Alface NaN
Alho NaN
... ... ... ... ... ...
3 2017 Abacate NaN
Abóbora (inclui butternut) NaN
Alface NaN
Alho NaN
2018 Abacate NaN
Abóbora (inclui butternut) NaN
Alface NaN
Alho NaN
2019 Abacate 50.0
Abóbora (inclui butternut) NaN
Alface NaN
Alho NaN
2020 Abacate 50.0
Abóbora (inclui butternut) NaN
Alface NaN
Alho NaN
2021 Abacate 50.0
Abóbora (inclui butternut) NaN
Alface NaN
Alho NaN
As I have much more columns than the ones I wrote here, I tried doing a "for" function that runs all columns and then grupby with the other indexes but it doesn't work.

You are looking for stack:
df = df.stack().to_frame().rename(columns={0:"Productivity"})
Full example:
df = pd.DataFrame(data=[["PT","1986","NaN","NaN","NaN","NaN"],["PT","1987","NaN","NaN","NaN","NaN"],["PT","1988","NaN","NaN","NaN","NaN"],["PT","1989","NaN","NaN","NaN","NaN"],["PT","1990","NaN","NaN","NaN","NaN"],["3","2017","NaN","NaN","NaN","NaN"],["3","2018","NaN","NaN","NaN","NaN"],["3","2019","50.0","NaN","NaN","NaN"],["3","2020","50.0","NaN","NaN","NaN"],["3","2021","50.0","NaN","NaN","NaN"]], columns=["Region","years","Abacate","Abóbora (inclui butternut)","Alface","Alho"])
df = df.groupby(["Region", "years"]).agg(Abacate=("Abacate","sum"), Abóbora=("Abóbora (inclui butternut)","sum"), Alface=("Alface","sum"), Alho=("Alho","sum"))
df = df.rename_axis("Products", axis="columns")
df = df.stack().to_frame().rename(columns={0:"Productivity"})
Output:
Productivity
Region years Products
3 2017 Abacate NaN
Abóbora NaN
Alface NaN
Alho NaN
2018 Abacate NaN
Abóbora NaN
Alface NaN
Alho NaN
2019 Abacate 50.0
Abóbora NaN
Alface NaN
Alho NaN
2020 Abacate 50.0
Abóbora NaN
Alface NaN
Alho NaN
2021 Abacate 50.0
Abóbora NaN
Alface NaN
Alho NaN
PT 1986 Abacate NaN
Abóbora NaN
Alface NaN
Alho NaN
1987 Abacate NaN
Abóbora NaN
Alface NaN
Alho NaN
1988 Abacate NaN
Abóbora NaN
Alface NaN
Alho NaN
1989 Abacate NaN
Abóbora NaN
Alface NaN
Alho NaN
1990 Abacate NaN
Abóbora NaN
Alface NaN
Alho NaN

Related

Mask lots of missing Data in tricontourf

I have a relatively large dataset which contains data for a whole year. I did so by concatenating all the dataframes for each doy to come up with this huge dataset however on some of the days there is no data available so there are large gaps in the data. I only want to plot the real data and mask or white out the missing data. I tried to resample the data to hourly but when i do this i get an "Error in qhull Delaunay triangulation calculation: input inconsistency (exitcode=1)"
So at first i tried to drop the NAN the problem is tricontourf ended up filling the missing data instead of ignoring it or masking it. So i came up with the solution below but it is only masking part of the points and filling the other half with artifacts.
import matplotlib.pyplot as mp
import numpy as np
import matplotlib.tri as tri
fig,ax=plt.subplots()
dy=devstns[0]
dy=dy.resample("H",base=1).mean()
dy["date"]=dy.index
dy["doy"] = dy["date"].apply(lambda x: x.timetuple().tm_yday)
dy =dy.fillna(0)
x=dy.doy.values
y=dy.UT.values[![enter image description here][1]][1]
z=dy.TEC.values
bad = np.ma.masked_invalid(z)
isbad=np.equal(z,0)
triang = tri.Triangulation(x, y)
mask = np.any(np.where(isbad[triang.triangles], True, False), axis=1)
triang.set_mask(mask)
colplt = ax.tricontourf(triang, z)
Here is a data sample
|pctDev | doy | deltaTEC | QTEC | year | TEC | UT
date
2018-08-01 00:00:00 NaN NaN NaN NaN NaN NaN NaN
2018-08-01 01:00:00 NaN NaN NaN NaN NaN NaN NaN
2018-08-01 02:00:00 NaN NaN NaN NaN NaN NaN NaN
2018-08-01 03:00:00 NaN NaN NaN NaN NaN NaN NaN
2018-08-01 04:00:00 NaN NaN NaN NaN NaN NaN NaN
2018-08-01 05:00:00 NaN NaN NaN NaN NaN NaN NaN
2018-08-01 06:00:00 NaN NaN NaN NaN NaN NaN NaN
2018-08-01 07:00:00 NaN NaN NaN NaN NaN NaN NaN
2018-08-01 08:00:00 NaN NaN NaN NaN NaN NaN NaN
2018-08-01 09:00:00 NaN NaN NaN NaN NaN NaN NaN
2018-08-01 10:00:00 NaN NaN NaN NaN NaN NaN NaN
2018-08-01 11:00:00 NaN NaN NaN NaN NaN NaN NaN
2018-08-01 21:00:00 NaN NaN NaN NaN NaN NaN NaN
2018-08-01 22:00:00 NaN NaN NaN NaN NaN NaN NaN
2018-08-01 23:00:00 NaN NaN NaN NaN NaN NaN NaN
2018-08-02 00:00:00 NaN NaN NaN NaN NaN NaN NaN
2018-08-02 01:00:00 NaN NaN NaN NaN NaN NaN NaN
2018-08-02 02:00:00 NaN NaN NaN NaN NaN NaN NaN
2018-08-02 03:00:00 NaN NaN NaN NaN NaN NaN NaN
2018-08-02 04:00:00 NaN NaN NaN NaN NaN NaN NaN
2018-08-02 05:00:00 NaN NaN NaN NaN NaN NaN NaN
2018-08-02 06:00:00 NaN NaN NaN NaN NaN NaN NaN
2018-08-02 07:00:00 NaN NaN NaN NaN NaN NaN NaN
2018-08-02 08:00:00 NaN NaN NaN NaN NaN NaN NaN
2018-08-05 14:00:00 NaN NaN NaN NaN NaN NaN NaN
2018-08-05 15:00:00 NaN NaN NaN NaN NaN NaN NaN
2018-08-05 16:00:00 NaN NaN NaN NaN NaN NaN NaN
2018-08-15 00:00:00 -33.568720 227.0 -2.578583 7.558583 2018.0 4.980000 0.491667
2018-08-15 01:00:00 -21.027371 227.0 -1.216333 5.755833 2018.0 4.539500 1.491667
2018-08-15 02:00:00 -11.645713 227.0 -0.593917 5.052917 2018.0 4.459000 2.491667
2018-08-15 03:00:00 -11.743647 227.0 -0.461083 3.936250 2018.0 3.475167 3.491667
2018-08-15 04:00:00 -5.666851 227.0 -0.184583 3.155417 2018.0 2.970833 4.491667
2018-08-15 05:00:00 -5.690906 227.0 -0.154583 2.702417 2018.0 2.547833 5.491667
2018-08-15 06:00:00 -16.918020 227.0 -0.469583 2.766583 2018.0 2.297000 6.491667
2018-08-15 07:00:00 -2.511416 227.0 -0.061917 2.550750 2018.0 2.488833 7.491667
The tricontourf apparently does not accept NAN for the x and y arrays so I filled in the missing x values just as i did for the Julian day which is probably why it was masking only halfway. In my case I used the timetuple to fill in missing julian day and time. I think this allows the triangulation to find the indices where z in Nan(set to zero) so as to mask.
import matplotlib.pyplot as mp
import numpy as np
import matplotlib.tri as tri
fig,ax=plt.subplots()
dy=devstns[0]
dy=dy.resample("H",base=1).mean()
dy["date"]=dy.index
dy["doy"] = dy["date"].apply(lambda x: x.timetuple().tm_yday)
dy["HH"] = dy["date"].apply(lambda x: x.timetuple().tm_hour)
dy =dy.fillna(0)
x=dy.doy.values
y=dy.UT.values
z=dy.TEC.values
isbad=np.equal(z,0)
triang = tri.Triangulation(x, y)
mask = np.any(np.where(isbad[triang.triangles], True, False), axis=1)
triang.set_mask(mask)
colplt = ax.tricontourf(triang, z)

Is it possible to turn quartely data to monthly?

I'm struggling with this problem and I'm not sure if I'm approaching it correctly.
I have this dataset:
ticker date filing_date_x currency_symbol_x researchdevelopment effectofaccountingcharges incomebeforetax minorityinterest netincome sellinggeneraladministrative grossprofit ebit nonoperatingincomenetother operatingincome otheroperatingexpenses interestexpense taxprovision interestincome netinterestincome extraordinaryitems nonrecurring otheritems incometaxexpense totalrevenue totaloperatingexpenses costofrevenue totalotherincomeexpensenet discontinuedoperations netincomefromcontinuingops netincomeapplicabletocommonshares preferredstockandotheradjustments filing_date_y currency_symbol_y totalassets intangibleassets earningassets othercurrentassets totalliab totalstockholderequity deferredlongtermliab ... totalcurrentliabilities shorttermdebt shortlongtermdebt shortlongtermdebttotal otherstockholderequity propertyplantequipment totalcurrentassets longterminvestments nettangibleassets shortterminvestments netreceivables longtermdebt inventory accountspayable totalpermanentequity noncontrollinginterestinconsolidatedentity temporaryequityredeemablenoncontrollinginterests accumulatedothercomprehensiveincome additionalpaidincapital commonstocktotalequity preferredstocktotalequity retainedearningstotalequity treasurystock accumulatedamortization noncurrrentassetsother deferredlongtermassetcharges noncurrentassetstotal capitalleaseobligations longtermdebttotal noncurrentliabilitiesother noncurrentliabilitiestotal negativegoodwill warrants preferredstockredeemable capitalsurpluse liabilitiesandstockholdersequity cashandshortterminvestments propertyplantandequipmentgross accumulateddepreciation commonstocksharesoutstanding
116638 JNJ.US 2019-12-31 2020-02-18 USD 3.232000e+09 NaN 4.218000e+09 NaN 4.010000e+09 6.039000e+09 1.363200e+10 6.119000e+09 6.500000e+07 4.238000e+09 NaN 85000000.0 208000000.0 81000000.0 -4000000.0 NaN 104000000.0 NaN 208000000.0 2.074700e+10 9.414000e+09 7.115000e+09 -1.200000e+08 NaN 4.010000e+09 4.010000e+09 NaN 2020-02-18 USD 1.577280e+11 4.764300e+10 NaN 2.486000e+09 9.825700e+10 5.947100e+10 5.958000e+09 ... 3.596400e+10 1.202000e+09 1.202000e+09 NaN -1.589100e+10 1.765800e+10 4.527400e+10 1.149000e+09 -2.181100e+10 1.982000e+09 1.448100e+10 2.649400e+10 9.020000e+09 3.476200e+10 NaN NaN NaN NaN NaN 3.120000e+09 NaN 1.106590e+11 -3.841700e+10 NaN 5.695000e+09 7.819000e+09 1.124540e+11 NaN 2.649400e+10 2.984100e+10 6.229300e+10 NaN NaN NaN NaN 1.577280e+11 1.928700e+10 NaN NaN 2.632507e+09
116569 JNJ.US 2020-03-31 2020-04-29 USD 2.580000e+09 NaN 6.509000e+09 NaN 5.796000e+09 5.203000e+09 1.364400e+10 8.581000e+09 7.460000e+08 5.788000e+09 NaN 25000000.0 713000000.0 67000000.0 42000000.0 300000000.0 58000000.0 NaN 713000000.0 2.069100e+10 7.135000e+09 7.047000e+09 6.210000e+08 NaN 5.796000e+09 5.796000e+09 NaN 2020-04-29 USD 1.550170e+11 4.733800e+10 NaN 2.460000e+09 9.372300e+10 6.129400e+10 5.766000e+09 ... 3.368900e+10 2.190000e+09 2.190000e+09 NaN -1.624300e+10 1.740100e+10 4.422600e+10 NaN -1.951500e+10 2.494000e+09 1.487400e+10 2.539300e+10 8.868000e+09 3.149900e+10 NaN NaN NaN NaN NaN 3.120000e+09 NaN 1.129010e+11 -3.848400e+10 NaN 5.042000e+09 NaN 7.539000e+09 NaN 2.539300e+10 2.887500e+10 6.003400e+10 NaN NaN NaN NaN 1.550170e+11 1.802400e+10 4.324700e+10 -2.584600e+10 2.632392e+09
116420 JNJ.US 2020-06-30 2020-07-24 USD 2.707000e+09 NaN 3.940000e+09 NaN 3.626000e+09 4.993000e+09 1.177900e+10 5.711000e+09 -5.000000e+06 3.990000e+09 NaN 45000000.0 314000000.0 19000000.0 -26000000.0 NaN 67000000.0 NaN 314000000.0 1.833600e+10 7.839000e+09 6.557000e+09 -8.500000e+07 NaN 3.626000e+09 3.626000e+09 NaN 2020-07-24 USD 1.583800e+11 4.741300e+10 NaN 2.688000e+09 9.540200e+10 6.297800e+10 5.532000e+09 ... 3.677200e+10 5.332000e+09 5.332000e+09 NaN -1.553300e+10 1.759800e+10 4.589200e+10 NaN -1.832500e+10 7.961000e+09 1.464500e+10 2.506200e+10 9.424000e+09 3.144000e+10 NaN NaN NaN NaN NaN 3.120000e+09 NaN 1.138980e+11 -3.850700e+10 NaN 5.782000e+09 NaN 7.805000e+09 NaN 2.506200e+10 2.803600e+10 5.863000e+10 NaN NaN NaN NaN 1.583800e+11 1.913500e+10 4.405600e+10 -2.645800e+10 2.632377e+09
116235 JNJ.US 2020-09-30 2020-10-23 USD 2.840000e+09 NaN 4.401000e+09 NaN 3.554000e+09 5.431000e+09 1.411000e+10 4.445000e+09 -1.188000e+09 5.633000e+09 NaN 44000000.0 847000000.0 12000000.0 -32000000.0 NaN 206000000.0 NaN 847000000.0 2.108200e+10 8.477000e+09 6.972000e+09 -1.268000e+09 NaN 3.554000e+09 3.554000e+09 NaN 2020-10-23 USD 1.706930e+11 4.700600e+10 NaN 2.619000e+09 1.062200e+11 6.447300e+10 5.615000e+09 ... 3.884700e+10 5.078000e+09 5.078000e+09 NaN -1.493800e+10 1.785500e+10 5.757800e+10 NaN -1.684000e+10 1.181600e+10 1.457900e+10 3.268000e+10 9.599000e+09 3.376900e+10 NaN NaN NaN NaN NaN 3.120000e+09 NaN 1.148310e+11 -3.854000e+10 NaN 6.131000e+09 NaN 7.816000e+09 NaN 3.268000e+10 2.907800e+10 6.737300e+10 NaN NaN NaN NaN 1.706930e+11 3.078100e+10 4.516200e+10 -2.730700e+10 2.632167e+09
116135 JNJ.US 2020-12-31 2021-02-22 USD 4.032000e+09 NaN 1.647000e+09 NaN 1.738000e+09 6.457000e+09 1.466100e+10 1.734000e+09 -2.341000e+09 4.075000e+09 NaN 87000000.0 -91000000.0 13000000.0 -74000000.0 NaN 97000000.0 NaN -91000000.0 2.247500e+10 1.058600e+10 7.814000e+09 -2.414000e+09 NaN 1.738000e+09 1.738000e+09 NaN 2021-02-22 USD 1.748940e+11 5.340200e+10 NaN 3.132000e+09 1.116160e+11 6.327800e+10 7.214000e+09 ... 4.249300e+10 2.631000e+09 2.631000e+09 NaN -1.524200e+10 1.876600e+10 5.123700e+10 NaN -2.651700e+10 1.120000e+10 1.357600e+10 3.263500e+10 9.344000e+09 3.986200e+10 NaN NaN NaN NaN NaN 3.120000e+09 NaN 1.138900e+11 -3.849000e+10 NaN 6.562000e+09 NaN 8.534000e+09 NaN 3.263500e+10 2.927400e+10 6.912300e+10 NaN NaN NaN NaN 1.748940e+11 2.518500e+10 NaN NaN 2.632512e+09
then I have this dataframe(daily) prices:
ticker date open high low close adjusted_close volume
0 JNJ.US 2021-08-02 172.470 172.840 171.300 172.270 172.2700 3620659
1 JNJ.US 2021-07-30 172.540 172.980 171.840 172.200 172.2000 5346400
2 JNJ.US 2021-07-29 172.740 173.340 171.090 172.180 172.1800 4214100
3 JNJ.US 2021-07-28 172.730 173.380 172.080 172.180 172.1800 5750700
4 JNJ.US 2021-07-27 171.800 172.720 170.670 172.660 172.6600 7089300
I have daily data in the price data but I have quarterly data in the first data frame. I want to merge the dataframe in a way that all the prices between Jan-01-2020 and Mar-01-2020 are being merged with the correct row.
I'm not sure exactly how to do this. I thought of extracting the date to month-year but I still don't know how to merge based on the range of values?
Any suggestions would be welcomed, if I'm not clear please let me know and I can clarify.
If I understand correctly you could create common year and quarter columns for each DataFrame and do a merge on those columns. I did a left merge if you only want to match columns in the left dataset (daily data).
If this is not what you are looking for, could you please clarify with a sample input/output?
# importing pandas as pd
import pandas as pd
# Creating dummy data of daily values
dt = pd.Series(['2020-08-02', '2020-07-30', '2020-07-29',
'2020-07-28', '2020-07-27'])
# Convert the underlying data to datetime
dt = pd.to_datetime(dt)
dt_df = pd.DataFrame(dt, columns=['date'])
dt_df['quarter_1'] = dt_df['date'].dt.quarter
dt_df['year_1'] = dt_df['date'].dt.year
print(dt_df)
date quarter_1 year_1
0 2020-08-02 3 2020
1 2020-07-30 3 2020
2 2020-07-29 3 2020
3 2020-07-28 3 2020
4 2020-07-27 3 2020
# Creating dummy data of quarterly values
dt2 = pd.Series(['2019-12-31', '2020-03-31', '2020-06-30',
'2020-09-30', '2020-12-31'])
# Convert the underlying data to datetime
dt2 = pd.to_datetime(dt2)
dt2_df = pd.DataFrame(sr2, columns=['date2'])
dt2_df['quarter_2'] = dt2_df['date2'].dt.quarter
dt2_df['year_2'] = dt2_df['date2'].dt.year
print(dt2_df)
date_quarter quarter_2 year_2
0 2019-12-31 4 2019
1 2020-03-31 1 2020
2 2020-06-30 2 2020
3 2020-09-30 3 2020
4 2020-12-31 4 2020
Then you can just merge on how ever you want.
dt_df.merge(dt2_df, how='left', left_on=['quarter_1', 'year_1'], right_on=['quarter_2', 'year_2'] , validate="many_to_many")
OUTPUT:
date quarter_1 year_1 date_quarter quarter_2 year_2
0 2020-08-02 3 2020 2020-09-30 3 2020
1 2020-07-30 3 2020 2020-09-30 3 2020
2 2020-07-29 3 2020 2020-09-30 3 2020
3 2020-07-28 3 2020 2020-09-30 3 2020
4 2020-07-27 3 2020 2020-09-30 3 2020

Converting Annual and Monthly data to weekly in Python

My current data has variables recorded at different time interval and I want to have all variables cleaned and nicely aligned in a weekly format by either redistribution (weekly = monthly/4) or fill in the monthly value for each week (weekly = monthly).
df=pd.DataFrame({
'Date':['2020-06-03','2020-06-08','2020-06-15','2020-06-22','2020-06-29','2020-07-15','2020-08-15','2020-09-15','2020-10-14','2020-11-15','2020-12-15','2020-12-31','2021-01-15'],
'Date_Type':['Week_start_Mon','Week_start_Mon','Week_start_Mon','Week_start_Mon','Week_start_Mon','Monthly','Monthly','Monthly','Monthly','Monthly','Annual','Annual','Annual'],
'Var_Name':['A','A','A','A','B','C','C','C','E','F','G','G','H'],
'Var_Value':
[150,50,0,200,800,5000,2000,6000.15000,2300,3300,650000,980000,1240000]})
Date Date_Type Var_Name Var_Value
0 2020-06-03 Week_start_Mon A 150.0
1 2020-06-08 Week_start_Mon A 50.0
2 2020-06-15 Week_start_Mon A 0.0
3 2020-06-22 Week_start_Mon A 200.0
4 2020-06-29 Week_start_Mon B 800.0
5 2020-07-15 Monthly C 5000.0
6 2020-08-15 Monthly C 2000.0
7 2020-09-15 Monthly C 6000.15
8 2020-10-14 Monthly E 2300.0
9 2020-11-15 Monthly F 3300.0
10 2020-12-15 Annual G 650000.0
11 2020-12-31 Annual G 980000.0
12 2021-01-15 Annual H 1240000.0
An ideal output will look like this:
For variable C, the date range will be the start to the end dates of master df. All dates are aligned and set to start on Mondays of that week. The monthly variable value is evenly distributed to 4 weeks, and there would 0 for each week in June.
Similarly annual variables will be distributed to 52 weeks.
Date Date_Type Var_Name Var_Value
0 2020-06-01 Monthly C 0
1 2020-06-08 Monthly C 0
2 2020-06-15 Monthly C 0
3 2020-06-22 Monthly C 0
4 2020-06-29 Monthly C 0
5 2020-07-06 Monthly C 1250
6 2020-07-13 Monthly C 1250
7 2020-07-20 Monthly C 1250
8 2020-07-27 Monthly C 1250
9 2020-08-03 Monthly C 400
10 2020-08-10 Monthly C 400
11 2020-08-17 Monthly C 400
12 2020-08-24 Monthly C 400
13 2020-08-31 Monthly C 400
.
.
.
to the end date
For variable E, a percentage value that need to be filled for every week where it applies, the output would look like this:
Date Date_Type Var_Name Var_Value
0 2020-06-01 Monthly E 0
1 2020-06-08 Monthly E 0
2 2020-06-15 Monthly E 0
3 2020-06-22 Monthly E 0
.
.
.
5 2020-09-28 Monthly E 0
6 2020-10-05 Monthly E 0.35
7 2020-10-12 Monthly E 0.35
8 2020-10-19 Monthly E 0.35
9 2020-10-26 Monthly E 0.35
10 2020-11-02 Monthly E 0
11 2020-11-09 Monthly E 0
12 2020-11-16 Monthly E 0
Ultimately my goal is to create a loop for treating this kind of data
if weekly
xxxxx
if monthly
xxxxx
if annual
xxxxx
Please help!
This is a partial answer, I need some explanation.
Set Date as index and realign all dates to Monday (I assume Date is already a datetime64 dtype)
df = df.set_index("Date")
df.index = df.index.map(lambda d: d - pd.tseries.offsets.Day(d.weekday()))
>>> df
Date_Type Var_Name Var_Value
Date
2020-06-01 Weekly A 150.00
2020-06-08 Weekly A 50.00
2020-06-15 Weekly A 0.00
2020-06-22 Weekly A 200.00
2020-06-29 Weekly B 800.00
2020-07-13 Monthly C 5000.00
2020-08-10 Monthly C 2000.00
2020-09-14 Monthly C 6000.15
2020-10-12 Monthly E 2300.00
2020-11-09 Monthly F 3300.00
2020-12-14 Annual G 650000.00
2020-12-28 Annual G 980000.00
2021-01-11 Annual H 1240000.00
Create the index for each variable from 2020-06-01 to 2021-01-11 with a frequency of 7 days:
dti = pd.date_range(df.index.min(), df.index.max(), freq="7D", name="Date")
>>> dti
DatetimeIndex(['2020-06-01', '2020-06-08', '2020-06-15', '2020-06-22',
'2020-06-29', '2020-07-06', '2020-07-13', '2020-07-20',
'2020-07-27', '2020-08-03', '2020-08-10', '2020-08-17',
'2020-08-24', '2020-08-31', '2020-09-07', '2020-09-14',
'2020-09-21', '2020-09-28', '2020-10-05', '2020-10-12',
'2020-10-19', '2020-10-26', '2020-11-02', '2020-11-09',
'2020-11-16', '2020-11-23', '2020-11-30', '2020-12-07',
'2020-12-14', '2020-12-21', '2020-12-28', '2021-01-04',
'2021-01-11'],
dtype='datetime64[ns]', name='Date', freq='7D')
Reindex your dataframe with the new index (pivot for a better display):
df = df.pivot(columns=["Date_Type", "Var_Name"], values="Var_Value").reindex(dti)
>>> df
Date_Type Weekly Monthly Annual
Var_Name A B C E F G H
Date
2020-06-01 150.0 NaN NaN NaN NaN NaN NaN
2020-06-08 50.0 NaN NaN NaN NaN NaN NaN
2020-06-15 0.0 NaN NaN NaN NaN NaN NaN
2020-06-22 200.0 NaN NaN NaN NaN NaN NaN
2020-06-29 NaN 800.0 NaN NaN NaN NaN NaN
2020-07-06 NaN NaN NaN NaN NaN NaN NaN
2020-07-13 NaN NaN 5000.00 NaN NaN NaN NaN
2020-07-20 NaN NaN NaN NaN NaN NaN NaN
2020-07-27 NaN NaN NaN NaN NaN NaN NaN
2020-08-03 NaN NaN NaN NaN NaN NaN NaN
2020-08-10 NaN NaN 2000.00 NaN NaN NaN NaN
2020-08-17 NaN NaN NaN NaN NaN NaN NaN
2020-08-24 NaN NaN NaN NaN NaN NaN NaN
2020-08-31 NaN NaN NaN NaN NaN NaN NaN
2020-09-07 NaN NaN NaN NaN NaN NaN NaN
2020-09-14 NaN NaN 6000.15 NaN NaN NaN NaN
2020-09-21 NaN NaN NaN NaN NaN NaN NaN
2020-09-28 NaN NaN NaN NaN NaN NaN NaN
2020-10-05 NaN NaN NaN NaN NaN NaN NaN
2020-10-12 NaN NaN NaN 2300.0 NaN NaN NaN
2020-10-19 NaN NaN NaN NaN NaN NaN NaN
2020-10-26 NaN NaN NaN NaN NaN NaN NaN
2020-11-02 NaN NaN NaN NaN NaN NaN NaN
2020-11-09 NaN NaN NaN NaN 3300.0 NaN NaN
2020-11-16 NaN NaN NaN NaN NaN NaN NaN
2020-11-23 NaN NaN NaN NaN NaN NaN NaN
2020-11-30 NaN NaN NaN NaN NaN NaN NaN
2020-12-07 NaN NaN NaN NaN NaN NaN NaN
2020-12-14 NaN NaN NaN NaN NaN 650000.0 NaN
2020-12-21 NaN NaN NaN NaN NaN NaN NaN
2020-12-28 NaN NaN NaN NaN NaN 980000.0 NaN
2021-01-04 NaN NaN NaN NaN NaN NaN NaN
2021-01-11 NaN NaN NaN NaN NaN NaN 1240000.0
It only remains to fill in the missing values. It can be easy if I know how to deal with:
if weekly
xxxxx
if monthly
xxxxx
if annual
xxxxx

How to remove periods of time in a dataframe?

I have this df:
CODE YEAR MONTH DAY TMAX TMIN PP BAD PERIOD 1 BAD PERIOD 2
9984 000130 1991 1 1 32.6 23.4 0.0 1991 1998
9985 000130 1991 1 2 31.2 22.4 0.0 NaN NaN
9986 000130 1991 1 3 32.0 NaN 0.0 NaN NaN
9987 000130 1991 1 4 32.2 23.0 0.0 NaN NaN
9988 000130 1991 1 5 30.5 22.0 0.0 NaN NaN
... ... ... ... ... ... ...
20118 000130 2018 9 30 31.8 21.2 NaN NaN NaN
30028 000132 1991 1 1 35.2 NaN 0.0 2005 2010
30029 000132 1991 1 2 34.6 NaN 0.0 NaN NaN
30030 000132 1991 1 3 35.8 NaN 0.0 NaN NaN
30031 000132 1991 1 4 34.8 NaN 0.0 NaN NaN
... ... ... ... ... ... ...
50027 000132 2019 10 5 36.5 NaN 13.1 NaN NaN
50028 000133 1991 1 1 36.2 NaN 0.0 1991 2010
50029 000133 1991 1 2 36.6 NaN 0.0 NaN NaN
50030 000133 1991 1 3 36.8 NaN 5.0 NaN NaN
50031 000133 1991 1 4 36.8 NaN 0.0 NaN NaN
... ... ... ... ... ... ...
54456 000133 2019 10 5 36.5 NaN 12.1 NaN NaN
I want to change the values ​​of the columns TMAX TMIN and PP to NaN, only of the periods specified in Bad Period 1 and Bad period 2 AND ONLY IN THEIR RESPECTIVE CODE. For example if I have Bad Period 1 equal to 1991 and Bad period 2 equal to 1998 I want all the values of TMAX, TMIN and PP that have code 000130 have NaN values since 1991 (bad period 1) to 1998 (bad period 2). I have 371 unique CODES in CODE column so i might use df.groupby("CODE").
Expected result after the change:
CODE YEAR MONTH DAY TMAX TMIN PP BAD PERIOD 1 BAD PERIOD 2
9984 000130 1991 1 1 NaN NaN NaN 1991 1998
9985 000130 1991 1 2 NaN NaN NaN NaN NaN
9986 000130 1991 1 3 NaN NaN NaN NaN NaN
9987 000130 1991 1 4 NaN NaN NaN NaN NaN
9988 000130 1991 1 5 NaN NaN NaN NaN NaN
... ... ... ... ... ... ...
20118 000130 2018 9 30 31.8 21.2 NaN NaN NaN
30028 000132 1991 1 1 35.2 NaN 0.0 2005 2010
30029 000132 1991 1 2 34.6 NaN 0.0 NaN NaN
30030 000132 1991 1 3 35.8 NaN 0.0 NaN NaN
30031 000132 1991 1 4 34.8 NaN 0.0 NaN NaN
... ... ... ... ... ... ...
50027 000132 2019 10 5 36.5 NaN 13.1 NaN NaN
50028 000133 1991 1 1 NaN NaN NaN 1991 2010
50029 000133 1991 1 2 NaN NaN NaN NaN NaN
50030 000133 1991 1 3 NaN NaN NaN NaN NaN
50031 000133 1991 1 4 NaN NaN NaN NaN NaN
... ... ... ... ... ... ...
54456 000133 2019 10 5 36.5 NaN 12.1 NaN NaN
you can propagate the values in your bad columns with ffill, if the non nan values are always at the first row per group of CODE and your data is ordered per CODE. If not, with groupby.transform and first. Then use mask to replace by nan where the YEAR is between your two bad columns once filled with the wanted value.
df_ = df[['BAD_1', 'BAD_2']].ffill()
#or more flexible df_ = df.groupby("CODE")[['BAD_1', 'BAD_2']].transform('first')
cols = ['TMAX', 'TMIN', 'PP']
df[cols] = df[cols].mask(df['YEAR'].ge(df_['BAD_1'])
& df['YEAR'].le(df_['BAD_2']))
print(df)
CODE YEAR MONTH DAY TMAX TMIN PP BAD_1 BAD_2
9984 130 1991 1 1 NaN NaN NaN 1991.0 1998.0
9985 130 1991 1 2 NaN NaN NaN NaN NaN
9986 130 1991 1 3 NaN NaN NaN NaN NaN
9987 130 1991 1 4 NaN NaN NaN NaN NaN
9988 130 1991 1 5 NaN NaN NaN NaN NaN
20118 130 2018 9 30 31.8 21.2 NaN NaN NaN
30028 132 1991 1 1 35.2 NaN 0.0 2005.0 2010.0
30029 132 1991 1 2 34.6 NaN 0.0 NaN NaN
30030 132 1991 1 3 35.8 NaN 0.0 NaN NaN
30031 132 1991 1 4 34.8 NaN 0.0 NaN NaN
50027 132 2019 10 5 36.5 NaN 13.1 NaN NaN
50028 133 1991 1 1 NaN NaN NaN 1991.0 2010.0
50029 133 1991 1 2 NaN NaN NaN NaN NaN
50030 133 1991 1 3 NaN NaN NaN NaN NaN
50031 133 1991 1 4 NaN NaN NaN NaN NaN
54456 133 2019 10 5 36.5 NaN 12.1 NaN NaN

How can I unpivot data with multiple columns and multiple variables in pandas?

How can I unpivot data with multiple columns and multiple variables in pandas?
my input:
And desire output:
Remove the Na, add a column name, and 'append()' the value to an empty 'DataFrame'.
product ene ene_total feb feb_total mar mar_total
0 A NaN NaN 2.0 218.75 NaN NaN
1 B NaN NaN 1.0 27.40 NaN NaN
2 C NaN NaN NaN NaN 24.0 1530.00
3 D NaN NaN NaN NaN 24.0 1102.50
4 E NaN NaN NaN NaN 12.0 206.79
5 F NaN NaN NaN NaN 24.0 317.14
6 G 6.0 98.89 NaN NaN NaN NaN
7 H NaN NaN NaN NaN 24.0 385.29
8 I NaN NaN NaN NaN 25.0 895.98
new_df = pd.DataFrame(index=[], columns=[0,1,2,3])
for i in range(len(df)):
tmp = df.iloc[i].dropna()
new_df = new_df.append(pd.Series([tmp.index[1],tmp[0],tmp[1],tmp[2]]), ignore_index=True)
new_df.rename(columns={0:'period', 2:'unit', 3:'total'}).set_index(1)
period unit total
1
A feb 2.0 218.75
B feb 1.0 27.40
C mar 24.0 1530.00
D mar 24.0 1102.50
E mar 12.0 206.79
F mar 24.0 317.14
G ene 6.0 98.89
H mar 24.0 385.29
I mar 25.0 895.98