Mask lots of missing Data in tricontourf - matplotlib

I have a relatively large dataset which contains data for a whole year. I did so by concatenating all the dataframes for each doy to come up with this huge dataset however on some of the days there is no data available so there are large gaps in the data. I only want to plot the real data and mask or white out the missing data. I tried to resample the data to hourly but when i do this i get an "Error in qhull Delaunay triangulation calculation: input inconsistency (exitcode=1)"
So at first i tried to drop the NAN the problem is tricontourf ended up filling the missing data instead of ignoring it or masking it. So i came up with the solution below but it is only masking part of the points and filling the other half with artifacts.
import matplotlib.pyplot as mp
import numpy as np
import matplotlib.tri as tri
fig,ax=plt.subplots()
dy=devstns[0]
dy=dy.resample("H",base=1).mean()
dy["date"]=dy.index
dy["doy"] = dy["date"].apply(lambda x: x.timetuple().tm_yday)
dy =dy.fillna(0)
x=dy.doy.values
y=dy.UT.values[![enter image description here][1]][1]
z=dy.TEC.values
bad = np.ma.masked_invalid(z)
isbad=np.equal(z,0)
triang = tri.Triangulation(x, y)
mask = np.any(np.where(isbad[triang.triangles], True, False), axis=1)
triang.set_mask(mask)
colplt = ax.tricontourf(triang, z)
Here is a data sample
|pctDev | doy | deltaTEC | QTEC | year | TEC | UT
date
2018-08-01 00:00:00 NaN NaN NaN NaN NaN NaN NaN
2018-08-01 01:00:00 NaN NaN NaN NaN NaN NaN NaN
2018-08-01 02:00:00 NaN NaN NaN NaN NaN NaN NaN
2018-08-01 03:00:00 NaN NaN NaN NaN NaN NaN NaN
2018-08-01 04:00:00 NaN NaN NaN NaN NaN NaN NaN
2018-08-01 05:00:00 NaN NaN NaN NaN NaN NaN NaN
2018-08-01 06:00:00 NaN NaN NaN NaN NaN NaN NaN
2018-08-01 07:00:00 NaN NaN NaN NaN NaN NaN NaN
2018-08-01 08:00:00 NaN NaN NaN NaN NaN NaN NaN
2018-08-01 09:00:00 NaN NaN NaN NaN NaN NaN NaN
2018-08-01 10:00:00 NaN NaN NaN NaN NaN NaN NaN
2018-08-01 11:00:00 NaN NaN NaN NaN NaN NaN NaN
2018-08-01 21:00:00 NaN NaN NaN NaN NaN NaN NaN
2018-08-01 22:00:00 NaN NaN NaN NaN NaN NaN NaN
2018-08-01 23:00:00 NaN NaN NaN NaN NaN NaN NaN
2018-08-02 00:00:00 NaN NaN NaN NaN NaN NaN NaN
2018-08-02 01:00:00 NaN NaN NaN NaN NaN NaN NaN
2018-08-02 02:00:00 NaN NaN NaN NaN NaN NaN NaN
2018-08-02 03:00:00 NaN NaN NaN NaN NaN NaN NaN
2018-08-02 04:00:00 NaN NaN NaN NaN NaN NaN NaN
2018-08-02 05:00:00 NaN NaN NaN NaN NaN NaN NaN
2018-08-02 06:00:00 NaN NaN NaN NaN NaN NaN NaN
2018-08-02 07:00:00 NaN NaN NaN NaN NaN NaN NaN
2018-08-02 08:00:00 NaN NaN NaN NaN NaN NaN NaN
2018-08-05 14:00:00 NaN NaN NaN NaN NaN NaN NaN
2018-08-05 15:00:00 NaN NaN NaN NaN NaN NaN NaN
2018-08-05 16:00:00 NaN NaN NaN NaN NaN NaN NaN
2018-08-15 00:00:00 -33.568720 227.0 -2.578583 7.558583 2018.0 4.980000 0.491667
2018-08-15 01:00:00 -21.027371 227.0 -1.216333 5.755833 2018.0 4.539500 1.491667
2018-08-15 02:00:00 -11.645713 227.0 -0.593917 5.052917 2018.0 4.459000 2.491667
2018-08-15 03:00:00 -11.743647 227.0 -0.461083 3.936250 2018.0 3.475167 3.491667
2018-08-15 04:00:00 -5.666851 227.0 -0.184583 3.155417 2018.0 2.970833 4.491667
2018-08-15 05:00:00 -5.690906 227.0 -0.154583 2.702417 2018.0 2.547833 5.491667
2018-08-15 06:00:00 -16.918020 227.0 -0.469583 2.766583 2018.0 2.297000 6.491667
2018-08-15 07:00:00 -2.511416 227.0 -0.061917 2.550750 2018.0 2.488833 7.491667

The tricontourf apparently does not accept NAN for the x and y arrays so I filled in the missing x values just as i did for the Julian day which is probably why it was masking only halfway. In my case I used the timetuple to fill in missing julian day and time. I think this allows the triangulation to find the indices where z in Nan(set to zero) so as to mask.
import matplotlib.pyplot as mp
import numpy as np
import matplotlib.tri as tri
fig,ax=plt.subplots()
dy=devstns[0]
dy=dy.resample("H",base=1).mean()
dy["date"]=dy.index
dy["doy"] = dy["date"].apply(lambda x: x.timetuple().tm_yday)
dy["HH"] = dy["date"].apply(lambda x: x.timetuple().tm_hour)
dy =dy.fillna(0)
x=dy.doy.values
y=dy.UT.values
z=dy.TEC.values
isbad=np.equal(z,0)
triang = tri.Triangulation(x, y)
mask = np.any(np.where(isbad[triang.triangles], True, False), axis=1)
triang.set_mask(mask)
colplt = ax.tricontourf(triang, z)

Related

Is it possible to turn quartely data to monthly?

I'm struggling with this problem and I'm not sure if I'm approaching it correctly.
I have this dataset:
ticker date filing_date_x currency_symbol_x researchdevelopment effectofaccountingcharges incomebeforetax minorityinterest netincome sellinggeneraladministrative grossprofit ebit nonoperatingincomenetother operatingincome otheroperatingexpenses interestexpense taxprovision interestincome netinterestincome extraordinaryitems nonrecurring otheritems incometaxexpense totalrevenue totaloperatingexpenses costofrevenue totalotherincomeexpensenet discontinuedoperations netincomefromcontinuingops netincomeapplicabletocommonshares preferredstockandotheradjustments filing_date_y currency_symbol_y totalassets intangibleassets earningassets othercurrentassets totalliab totalstockholderequity deferredlongtermliab ... totalcurrentliabilities shorttermdebt shortlongtermdebt shortlongtermdebttotal otherstockholderequity propertyplantequipment totalcurrentassets longterminvestments nettangibleassets shortterminvestments netreceivables longtermdebt inventory accountspayable totalpermanentequity noncontrollinginterestinconsolidatedentity temporaryequityredeemablenoncontrollinginterests accumulatedothercomprehensiveincome additionalpaidincapital commonstocktotalequity preferredstocktotalequity retainedearningstotalequity treasurystock accumulatedamortization noncurrrentassetsother deferredlongtermassetcharges noncurrentassetstotal capitalleaseobligations longtermdebttotal noncurrentliabilitiesother noncurrentliabilitiestotal negativegoodwill warrants preferredstockredeemable capitalsurpluse liabilitiesandstockholdersequity cashandshortterminvestments propertyplantandequipmentgross accumulateddepreciation commonstocksharesoutstanding
116638 JNJ.US 2019-12-31 2020-02-18 USD 3.232000e+09 NaN 4.218000e+09 NaN 4.010000e+09 6.039000e+09 1.363200e+10 6.119000e+09 6.500000e+07 4.238000e+09 NaN 85000000.0 208000000.0 81000000.0 -4000000.0 NaN 104000000.0 NaN 208000000.0 2.074700e+10 9.414000e+09 7.115000e+09 -1.200000e+08 NaN 4.010000e+09 4.010000e+09 NaN 2020-02-18 USD 1.577280e+11 4.764300e+10 NaN 2.486000e+09 9.825700e+10 5.947100e+10 5.958000e+09 ... 3.596400e+10 1.202000e+09 1.202000e+09 NaN -1.589100e+10 1.765800e+10 4.527400e+10 1.149000e+09 -2.181100e+10 1.982000e+09 1.448100e+10 2.649400e+10 9.020000e+09 3.476200e+10 NaN NaN NaN NaN NaN 3.120000e+09 NaN 1.106590e+11 -3.841700e+10 NaN 5.695000e+09 7.819000e+09 1.124540e+11 NaN 2.649400e+10 2.984100e+10 6.229300e+10 NaN NaN NaN NaN 1.577280e+11 1.928700e+10 NaN NaN 2.632507e+09
116569 JNJ.US 2020-03-31 2020-04-29 USD 2.580000e+09 NaN 6.509000e+09 NaN 5.796000e+09 5.203000e+09 1.364400e+10 8.581000e+09 7.460000e+08 5.788000e+09 NaN 25000000.0 713000000.0 67000000.0 42000000.0 300000000.0 58000000.0 NaN 713000000.0 2.069100e+10 7.135000e+09 7.047000e+09 6.210000e+08 NaN 5.796000e+09 5.796000e+09 NaN 2020-04-29 USD 1.550170e+11 4.733800e+10 NaN 2.460000e+09 9.372300e+10 6.129400e+10 5.766000e+09 ... 3.368900e+10 2.190000e+09 2.190000e+09 NaN -1.624300e+10 1.740100e+10 4.422600e+10 NaN -1.951500e+10 2.494000e+09 1.487400e+10 2.539300e+10 8.868000e+09 3.149900e+10 NaN NaN NaN NaN NaN 3.120000e+09 NaN 1.129010e+11 -3.848400e+10 NaN 5.042000e+09 NaN 7.539000e+09 NaN 2.539300e+10 2.887500e+10 6.003400e+10 NaN NaN NaN NaN 1.550170e+11 1.802400e+10 4.324700e+10 -2.584600e+10 2.632392e+09
116420 JNJ.US 2020-06-30 2020-07-24 USD 2.707000e+09 NaN 3.940000e+09 NaN 3.626000e+09 4.993000e+09 1.177900e+10 5.711000e+09 -5.000000e+06 3.990000e+09 NaN 45000000.0 314000000.0 19000000.0 -26000000.0 NaN 67000000.0 NaN 314000000.0 1.833600e+10 7.839000e+09 6.557000e+09 -8.500000e+07 NaN 3.626000e+09 3.626000e+09 NaN 2020-07-24 USD 1.583800e+11 4.741300e+10 NaN 2.688000e+09 9.540200e+10 6.297800e+10 5.532000e+09 ... 3.677200e+10 5.332000e+09 5.332000e+09 NaN -1.553300e+10 1.759800e+10 4.589200e+10 NaN -1.832500e+10 7.961000e+09 1.464500e+10 2.506200e+10 9.424000e+09 3.144000e+10 NaN NaN NaN NaN NaN 3.120000e+09 NaN 1.138980e+11 -3.850700e+10 NaN 5.782000e+09 NaN 7.805000e+09 NaN 2.506200e+10 2.803600e+10 5.863000e+10 NaN NaN NaN NaN 1.583800e+11 1.913500e+10 4.405600e+10 -2.645800e+10 2.632377e+09
116235 JNJ.US 2020-09-30 2020-10-23 USD 2.840000e+09 NaN 4.401000e+09 NaN 3.554000e+09 5.431000e+09 1.411000e+10 4.445000e+09 -1.188000e+09 5.633000e+09 NaN 44000000.0 847000000.0 12000000.0 -32000000.0 NaN 206000000.0 NaN 847000000.0 2.108200e+10 8.477000e+09 6.972000e+09 -1.268000e+09 NaN 3.554000e+09 3.554000e+09 NaN 2020-10-23 USD 1.706930e+11 4.700600e+10 NaN 2.619000e+09 1.062200e+11 6.447300e+10 5.615000e+09 ... 3.884700e+10 5.078000e+09 5.078000e+09 NaN -1.493800e+10 1.785500e+10 5.757800e+10 NaN -1.684000e+10 1.181600e+10 1.457900e+10 3.268000e+10 9.599000e+09 3.376900e+10 NaN NaN NaN NaN NaN 3.120000e+09 NaN 1.148310e+11 -3.854000e+10 NaN 6.131000e+09 NaN 7.816000e+09 NaN 3.268000e+10 2.907800e+10 6.737300e+10 NaN NaN NaN NaN 1.706930e+11 3.078100e+10 4.516200e+10 -2.730700e+10 2.632167e+09
116135 JNJ.US 2020-12-31 2021-02-22 USD 4.032000e+09 NaN 1.647000e+09 NaN 1.738000e+09 6.457000e+09 1.466100e+10 1.734000e+09 -2.341000e+09 4.075000e+09 NaN 87000000.0 -91000000.0 13000000.0 -74000000.0 NaN 97000000.0 NaN -91000000.0 2.247500e+10 1.058600e+10 7.814000e+09 -2.414000e+09 NaN 1.738000e+09 1.738000e+09 NaN 2021-02-22 USD 1.748940e+11 5.340200e+10 NaN 3.132000e+09 1.116160e+11 6.327800e+10 7.214000e+09 ... 4.249300e+10 2.631000e+09 2.631000e+09 NaN -1.524200e+10 1.876600e+10 5.123700e+10 NaN -2.651700e+10 1.120000e+10 1.357600e+10 3.263500e+10 9.344000e+09 3.986200e+10 NaN NaN NaN NaN NaN 3.120000e+09 NaN 1.138900e+11 -3.849000e+10 NaN 6.562000e+09 NaN 8.534000e+09 NaN 3.263500e+10 2.927400e+10 6.912300e+10 NaN NaN NaN NaN 1.748940e+11 2.518500e+10 NaN NaN 2.632512e+09
then I have this dataframe(daily) prices:
ticker date open high low close adjusted_close volume
0 JNJ.US 2021-08-02 172.470 172.840 171.300 172.270 172.2700 3620659
1 JNJ.US 2021-07-30 172.540 172.980 171.840 172.200 172.2000 5346400
2 JNJ.US 2021-07-29 172.740 173.340 171.090 172.180 172.1800 4214100
3 JNJ.US 2021-07-28 172.730 173.380 172.080 172.180 172.1800 5750700
4 JNJ.US 2021-07-27 171.800 172.720 170.670 172.660 172.6600 7089300
I have daily data in the price data but I have quarterly data in the first data frame. I want to merge the dataframe in a way that all the prices between Jan-01-2020 and Mar-01-2020 are being merged with the correct row.
I'm not sure exactly how to do this. I thought of extracting the date to month-year but I still don't know how to merge based on the range of values?
Any suggestions would be welcomed, if I'm not clear please let me know and I can clarify.
If I understand correctly you could create common year and quarter columns for each DataFrame and do a merge on those columns. I did a left merge if you only want to match columns in the left dataset (daily data).
If this is not what you are looking for, could you please clarify with a sample input/output?
# importing pandas as pd
import pandas as pd
# Creating dummy data of daily values
dt = pd.Series(['2020-08-02', '2020-07-30', '2020-07-29',
'2020-07-28', '2020-07-27'])
# Convert the underlying data to datetime
dt = pd.to_datetime(dt)
dt_df = pd.DataFrame(dt, columns=['date'])
dt_df['quarter_1'] = dt_df['date'].dt.quarter
dt_df['year_1'] = dt_df['date'].dt.year
print(dt_df)
date quarter_1 year_1
0 2020-08-02 3 2020
1 2020-07-30 3 2020
2 2020-07-29 3 2020
3 2020-07-28 3 2020
4 2020-07-27 3 2020
# Creating dummy data of quarterly values
dt2 = pd.Series(['2019-12-31', '2020-03-31', '2020-06-30',
'2020-09-30', '2020-12-31'])
# Convert the underlying data to datetime
dt2 = pd.to_datetime(dt2)
dt2_df = pd.DataFrame(sr2, columns=['date2'])
dt2_df['quarter_2'] = dt2_df['date2'].dt.quarter
dt2_df['year_2'] = dt2_df['date2'].dt.year
print(dt2_df)
date_quarter quarter_2 year_2
0 2019-12-31 4 2019
1 2020-03-31 1 2020
2 2020-06-30 2 2020
3 2020-09-30 3 2020
4 2020-12-31 4 2020
Then you can just merge on how ever you want.
dt_df.merge(dt2_df, how='left', left_on=['quarter_1', 'year_1'], right_on=['quarter_2', 'year_2'] , validate="many_to_many")
OUTPUT:
date quarter_1 year_1 date_quarter quarter_2 year_2
0 2020-08-02 3 2020 2020-09-30 3 2020
1 2020-07-30 3 2020 2020-09-30 3 2020
2 2020-07-29 3 2020 2020-09-30 3 2020
3 2020-07-28 3 2020 2020-09-30 3 2020
4 2020-07-27 3 2020 2020-09-30 3 2020

pandas dataframe rebuild based on cells

after a lot of testing I have ended with this df:
Date 1 2 3 4 5 6 7 8 9 10
0 2019-01-02 59.92 NaN NaN NaN NaN NaN NaN NaN NaN NaN
0 2019-01-02 NaN 197.28 NaN NaN NaN NaN NaN NaN NaN NaN
0 2019-01-02 NaN NaN 96.59 NaN NaN NaN NaN NaN NaN NaN
0 2019-01-02 NaN NaN NaN 275.0 NaN NaN NaN NaN NaN NaN
0 2019-01-02 NaN NaN NaN NaN 209.94 NaN NaN NaN NaN NaN
0 2019-01-02 NaN NaN NaN NaN NaN 99.83 NaN NaN NaN NaN
0 2019-01-02 NaN NaN NaN NaN NaN NaN 257.89 NaN NaN NaN
0 2019-01-02 NaN NaN NaN NaN NaN NaN NaN 215.54 NaN NaN
0 2019-01-02 NaN NaN NaN NaN NaN NaN NaN NaN 187.06 NaN
0 2019-01-02 NaN NaN NaN NaN NaN NaN NaN NaN NaN 386.9
Would be nice any kind of trik to put all this values on the same row. Any idea?
Thanks!!
Try via groupby() and sum():
df=df.groupby('Date').sum()
output:
Date 1 2 3 4 5 6 7 8 9 10
2019-01-02 59.92 197.28 96.59 275.0 209.94 99.83 257.89 215.54 187.06 386.9
An option with groupby first in case this would need to be performed for multiple different types where sum may not behave as expected:
df = df.groupby('Date', as_index=False).first()
Date 1 2 3 4 5 6 7 8 9 10
2019-01-02 59.92 197.28 96.59 275.0 209.94 99.83 257.89 215.54 187.06 386.9

Converting Annual and Monthly data to weekly in Python

My current data has variables recorded at different time interval and I want to have all variables cleaned and nicely aligned in a weekly format by either redistribution (weekly = monthly/4) or fill in the monthly value for each week (weekly = monthly).
df=pd.DataFrame({
'Date':['2020-06-03','2020-06-08','2020-06-15','2020-06-22','2020-06-29','2020-07-15','2020-08-15','2020-09-15','2020-10-14','2020-11-15','2020-12-15','2020-12-31','2021-01-15'],
'Date_Type':['Week_start_Mon','Week_start_Mon','Week_start_Mon','Week_start_Mon','Week_start_Mon','Monthly','Monthly','Monthly','Monthly','Monthly','Annual','Annual','Annual'],
'Var_Name':['A','A','A','A','B','C','C','C','E','F','G','G','H'],
'Var_Value':
[150,50,0,200,800,5000,2000,6000.15000,2300,3300,650000,980000,1240000]})
Date Date_Type Var_Name Var_Value
0 2020-06-03 Week_start_Mon A 150.0
1 2020-06-08 Week_start_Mon A 50.0
2 2020-06-15 Week_start_Mon A 0.0
3 2020-06-22 Week_start_Mon A 200.0
4 2020-06-29 Week_start_Mon B 800.0
5 2020-07-15 Monthly C 5000.0
6 2020-08-15 Monthly C 2000.0
7 2020-09-15 Monthly C 6000.15
8 2020-10-14 Monthly E 2300.0
9 2020-11-15 Monthly F 3300.0
10 2020-12-15 Annual G 650000.0
11 2020-12-31 Annual G 980000.0
12 2021-01-15 Annual H 1240000.0
An ideal output will look like this:
For variable C, the date range will be the start to the end dates of master df. All dates are aligned and set to start on Mondays of that week. The monthly variable value is evenly distributed to 4 weeks, and there would 0 for each week in June.
Similarly annual variables will be distributed to 52 weeks.
Date Date_Type Var_Name Var_Value
0 2020-06-01 Monthly C 0
1 2020-06-08 Monthly C 0
2 2020-06-15 Monthly C 0
3 2020-06-22 Monthly C 0
4 2020-06-29 Monthly C 0
5 2020-07-06 Monthly C 1250
6 2020-07-13 Monthly C 1250
7 2020-07-20 Monthly C 1250
8 2020-07-27 Monthly C 1250
9 2020-08-03 Monthly C 400
10 2020-08-10 Monthly C 400
11 2020-08-17 Monthly C 400
12 2020-08-24 Monthly C 400
13 2020-08-31 Monthly C 400
.
.
.
to the end date
For variable E, a percentage value that need to be filled for every week where it applies, the output would look like this:
Date Date_Type Var_Name Var_Value
0 2020-06-01 Monthly E 0
1 2020-06-08 Monthly E 0
2 2020-06-15 Monthly E 0
3 2020-06-22 Monthly E 0
.
.
.
5 2020-09-28 Monthly E 0
6 2020-10-05 Monthly E 0.35
7 2020-10-12 Monthly E 0.35
8 2020-10-19 Monthly E 0.35
9 2020-10-26 Monthly E 0.35
10 2020-11-02 Monthly E 0
11 2020-11-09 Monthly E 0
12 2020-11-16 Monthly E 0
Ultimately my goal is to create a loop for treating this kind of data
if weekly
xxxxx
if monthly
xxxxx
if annual
xxxxx
Please help!
This is a partial answer, I need some explanation.
Set Date as index and realign all dates to Monday (I assume Date is already a datetime64 dtype)
df = df.set_index("Date")
df.index = df.index.map(lambda d: d - pd.tseries.offsets.Day(d.weekday()))
>>> df
Date_Type Var_Name Var_Value
Date
2020-06-01 Weekly A 150.00
2020-06-08 Weekly A 50.00
2020-06-15 Weekly A 0.00
2020-06-22 Weekly A 200.00
2020-06-29 Weekly B 800.00
2020-07-13 Monthly C 5000.00
2020-08-10 Monthly C 2000.00
2020-09-14 Monthly C 6000.15
2020-10-12 Monthly E 2300.00
2020-11-09 Monthly F 3300.00
2020-12-14 Annual G 650000.00
2020-12-28 Annual G 980000.00
2021-01-11 Annual H 1240000.00
Create the index for each variable from 2020-06-01 to 2021-01-11 with a frequency of 7 days:
dti = pd.date_range(df.index.min(), df.index.max(), freq="7D", name="Date")
>>> dti
DatetimeIndex(['2020-06-01', '2020-06-08', '2020-06-15', '2020-06-22',
'2020-06-29', '2020-07-06', '2020-07-13', '2020-07-20',
'2020-07-27', '2020-08-03', '2020-08-10', '2020-08-17',
'2020-08-24', '2020-08-31', '2020-09-07', '2020-09-14',
'2020-09-21', '2020-09-28', '2020-10-05', '2020-10-12',
'2020-10-19', '2020-10-26', '2020-11-02', '2020-11-09',
'2020-11-16', '2020-11-23', '2020-11-30', '2020-12-07',
'2020-12-14', '2020-12-21', '2020-12-28', '2021-01-04',
'2021-01-11'],
dtype='datetime64[ns]', name='Date', freq='7D')
Reindex your dataframe with the new index (pivot for a better display):
df = df.pivot(columns=["Date_Type", "Var_Name"], values="Var_Value").reindex(dti)
>>> df
Date_Type Weekly Monthly Annual
Var_Name A B C E F G H
Date
2020-06-01 150.0 NaN NaN NaN NaN NaN NaN
2020-06-08 50.0 NaN NaN NaN NaN NaN NaN
2020-06-15 0.0 NaN NaN NaN NaN NaN NaN
2020-06-22 200.0 NaN NaN NaN NaN NaN NaN
2020-06-29 NaN 800.0 NaN NaN NaN NaN NaN
2020-07-06 NaN NaN NaN NaN NaN NaN NaN
2020-07-13 NaN NaN 5000.00 NaN NaN NaN NaN
2020-07-20 NaN NaN NaN NaN NaN NaN NaN
2020-07-27 NaN NaN NaN NaN NaN NaN NaN
2020-08-03 NaN NaN NaN NaN NaN NaN NaN
2020-08-10 NaN NaN 2000.00 NaN NaN NaN NaN
2020-08-17 NaN NaN NaN NaN NaN NaN NaN
2020-08-24 NaN NaN NaN NaN NaN NaN NaN
2020-08-31 NaN NaN NaN NaN NaN NaN NaN
2020-09-07 NaN NaN NaN NaN NaN NaN NaN
2020-09-14 NaN NaN 6000.15 NaN NaN NaN NaN
2020-09-21 NaN NaN NaN NaN NaN NaN NaN
2020-09-28 NaN NaN NaN NaN NaN NaN NaN
2020-10-05 NaN NaN NaN NaN NaN NaN NaN
2020-10-12 NaN NaN NaN 2300.0 NaN NaN NaN
2020-10-19 NaN NaN NaN NaN NaN NaN NaN
2020-10-26 NaN NaN NaN NaN NaN NaN NaN
2020-11-02 NaN NaN NaN NaN NaN NaN NaN
2020-11-09 NaN NaN NaN NaN 3300.0 NaN NaN
2020-11-16 NaN NaN NaN NaN NaN NaN NaN
2020-11-23 NaN NaN NaN NaN NaN NaN NaN
2020-11-30 NaN NaN NaN NaN NaN NaN NaN
2020-12-07 NaN NaN NaN NaN NaN NaN NaN
2020-12-14 NaN NaN NaN NaN NaN 650000.0 NaN
2020-12-21 NaN NaN NaN NaN NaN NaN NaN
2020-12-28 NaN NaN NaN NaN NaN 980000.0 NaN
2021-01-04 NaN NaN NaN NaN NaN NaN NaN
2021-01-11 NaN NaN NaN NaN NaN NaN 1240000.0
It only remains to fill in the missing values. It can be easy if I know how to deal with:
if weekly
xxxxx
if monthly
xxxxx
if annual
xxxxx

In Python, how can I update multiple rows in a DataFrame with a Series?

I have a dataframe as below.
a b c d
2010-07-23 NaN NaN NaN NaN
2010-07-26 NaN NaN NaN NaN
2010-07-27 NaN NaN NaN NaN
2010-07-28 NaN NaN NaN NaN
2010-07-29 NaN NaN NaN NaN
2010-07-30 NaN NaN NaN NaN
2010-08-02 NaN NaN NaN NaN
2010-08-03 NaN NaN NaN NaN
2010-08-04 NaN NaN NaN NaN
2010-08-05 NaN NaN NaN NaN
And I have a series as below.
2010-07-23
a 1
b 2
c 3
d 4
I want to update the DataFrame with the series as below. How can I do?
a b c d
2010-07-23 NaN NaN NaN NaN
2010-07-26 1 2 3 4
2010-07-27 1 2 3 4
2010-07-28 1 2 3 4
2010-07-29 NaN NaN NaN NaN
2010-07-30 NaN NaN NaN NaN
2010-08-02 NaN NaN NaN NaN
2010-08-03 NaN NaN NaN NaN
2010-08-04 NaN NaN NaN NaN
2010-08-05 NaN NaN NaN NaN
Thank you very much for the help in advance.
If there is one column instead Series in s add DataFrame.squeeze with concat by length of date range, last pass to DataFrame.update:
r = pd.date_range('2010-07-26','2010-07-28')
df.update(pd.concat([s.squeeze()] * len(r), axis=1, keys=r).T)
print (df)
a b c d
2010-07-23 NaN NaN NaN NaN
2010-07-26 1.0 2.0 3.0 4.0
2010-07-27 1.0 2.0 3.0 4.0
2010-07-28 1.0 2.0 3.0 4.0
2010-07-29 NaN NaN NaN NaN
2010-07-30 NaN NaN NaN NaN
2010-08-02 NaN NaN NaN NaN
2010-08-03 NaN NaN NaN NaN
2010-08-04 NaN NaN NaN NaN
2010-08-05 NaN NaN NaN NaN
Or you can use np.broadcast_to for repeat Series:
r = pd.date_range('2010-07-26','2010-07-28')
df1 = pd.DataFrame(np.broadcast_to(s.squeeze().values, (len(r),len(s))),
index=r,
columns=s.index)
print (df1)
a b c d
2010-07-26 1 2 3 4
2010-07-27 1 2 3 4
2010-07-28 1 2 3 4
df.update(df1)
print (df)
a b c d
2010-07-23 NaN NaN NaN NaN
2010-07-26 1.0 2.0 3.0 4.0
2010-07-27 1.0 2.0 3.0 4.0
2010-07-28 1.0 2.0 3.0 4.0
2010-07-29 NaN NaN NaN NaN
2010-07-30 NaN NaN NaN NaN
2010-08-02 NaN NaN NaN NaN
2010-08-03 NaN NaN NaN NaN
2010-08-04 NaN NaN NaN NaN
2010-08-05 NaN NaN NaN NaN

How to calculate time between occurrence of an event in a timeseries dataframe

Let's say I have the following dataframe:
df
A B C D event
Timestamp
1991-04-21 09:09:00 9.0 13.0 NaN NaN 100.0
1991-04-21 17:08:00 7.0 NaN NaN NaN 119.0
1991-04-21 22:51:00 NaN NaN 123.0 NaN NaN
1991-04-22 07:35:00 10.0 13.0 NaN NaN 216.0
1991-04-22 13:40:00 2.0 NaN NaN NaN NaN
1991-04-22 16:56:00 7.0 NaN NaN NaN 211.0
using the code
df['delta_time'] = (df['event']-df['event'].shift()).fillna(0)
I get
Timestamp A B C D event delta_time
1991-04-21 09:09:00 9.0 13.0 NaN NaN 100.0 00:00:00
1991-04-21 17:08:00 7.0 NaN NaN NaN 119.0 07:59:00
1991-04-21 22:51:00 NaN NaN 123.0 NaN NaN 05:43:00
1991-04-22 07:35:00 10.0 13.0 NaN NaN 216.0 08:44:00
1991-04-22 13:40:00 2.0 NaN NaN NaN NaN 06:05:00
1991-04-22 16:56:00 7.0 NaN NaN NaN 211.0 03:16:00
1991-04-23 07:25:00 11.0 13.0 NaN NaN 257.0 14:29:00
but what I am looking for is
Timestamp
1991-04-21 09:09:00 9.0 13.0 NaN NaN 100.0 00:00:00
1991-04-21 17:08:00 7.0 NaN NaN NaN 119.0 07:59:00
1991-04-21 22:51:00 NaN NaN 123.0 NaN NaN NaN
1991-04-22 07:35:00 10.0 13.0 NaN NaN 216.0 13:42:00
1991-04-22 13:40:00 2.0 NaN NaN NaN NaN NaN
1991-04-22 16:56:00 7.0 NaN NaN NaN 211.0 09:21:00
1991-04-23 07:25:00 11.0 13.0 NaN NaN 257.0 14:29:00
I want to calculate the time that has elapsed everytime an event occurs and omit the times when the event was an NaN. So what would be the correct approach to write that code.
I'm assuming this is what you want, I don't know if Timestamp is the index or not but this will work if it's not the index:
In [251]:
df['delta_time'] = df.loc[df['event'].notnull(),'Timestamp'].diff()
df
Out[251]:
Timestamp A B C D event delta_time
0 1991-04-21 09:09:00 9.0 13.0 NaN NaN 100.0 NaT
1 1991-04-21 17:08:00 7.0 NaN NaN NaN 119.0 07:59:00
2 1991-04-21 22:51:00 NaN NaN 123.0 NaN NaN NaT
3 1991-04-22 07:35:00 10.0 13.0 NaN NaN 216.0 14:27:00
4 1991-04-22 13:40:00 2.0 NaN NaN NaN NaN NaT
5 1991-04-22 16:56:00 7.0 NaN NaN NaN 211.0 09:21:00
if needed you need to call reset_index to restore the index back as a column
basically you mask the rows of interest based on where the 'event' is not null and then call diff to get the inter-row difference