seasonal_decompose: operands could not be broadcast together with shapes on a series - pandas

I know there are many questions on this topic, but none of them helped me to solve this problem. I'm really stuck on this.
With a simple series:
0
2016-01-31 266
2016-02-29 235
2016-03-31 347
2016-04-30 514
2016-05-31 374
2016-06-30 250
2016-07-31 441
2016-08-31 422
2016-09-30 323
2016-10-31 168
2016-11-30 496
2016-12-31 303
import statsmodels.api as sm
logdf = np.log(df[0])
decompose = sm.tsa.seasonal_decompose(logdf,freq=12, model='additive')
decomplot = decompose.plot()
i keep getting: ValueError: operands could not be broadcast together with shapes (12,) (14,)
I've tried pretty much everything, passing only logdf.values, passing a non-log series. It doesn't work.
Numpy and statsmodel versions:
print(statsmodels.__version__)
print(pd.__version__)
print(np.__version__)
0.6.1
0.18.1
1.11.3

As #yoonforh pointed, in my case this was fixed by setting the freq parameter to less than the time series length. E.g. if your time series ts looks like this:
2014-01-01 0.0
2014-02-01 0.0
2014-03-01 1.0
2014-04-01 1.0
2014-05-01 0.0
2014-06-01 1.0
2014-07-01 1.0
2014-08-01 0.0
2014-09-01 0.0
2014-10-01 1.0
2014-11-01 0.0
2014-12-01 0.0
the shape is
(12,)
so this will give the error as per above:
seasonal_decompose(ts, freq=12, model='additive')
but if I try freq=11 or any other int less than 12, e.g.
seasonal_decompose(ts, freq=11, model='additive')
this works

i noticed that with newer pandas and statsmodel versions it seems to work.
Given a series:
2016-01-03 8.326275
2016-01-10 8.898229
2016-01-17 8.754792
2016-01-24 8.658172
2016-01-31 8.731659
2016-02-07 9.047233
2016-02-14 8.799662
2016-02-21 8.783549
2016-02-28 8.782783
2016-03-06 9.081825
2016-03-13 8.737934
2016-03-20 8.658693
2016-03-27 8.666475
2016-04-03 9.029178
2016-04-10 8.781555
2016-04-17 8.720787
2016-04-24 8.633909
2016-05-01 8.937744
2016-05-08 8.804925
2016-05-15 8.766862
2016-05-22 8.651899
2016-05-29 8.653645
...
And pd/sm version:
statsmodels.__version__ 0.8.0
pandas.__version__ 0.20.1
This is the result:
import statsmodels.api as sm
logdf = np.log(df_series)
decompose = sm.tsa.seasonal_decompose(logdf, model='additive', filt=None, freq=1, two_sided=True)
decompose.plot()
I hope this could solve your problem too.

Related

pandas (multi) index wrong need to change it

I have a DataFrame multiData that looks like this:
print(multiData)
Date Open High Low Close Adj Close Volume
Ticker Date
AAPL 0 2010-01-04 7.62 7.66 7.59 7.64 6.51 493729600
1 2010-01-05 7.66 7.70 7.62 7.66 6.52 601904800
2 2010-01-06 7.66 7.69 7.53 7.53 6.41 552160000
3 2010-01-07 7.56 7.57 7.47 7.52 6.40 477131200
4 2010-01-08 7.51 7.57 7.47 7.57 6.44 447610800
... ... ... ... ... ... ... ...
META 2668 2022-12-23 116.03 118.18 115.54 118.04 118.04 17796600
2669 2022-12-27 117.93 118.60 116.05 116.88 116.88 21392300
2670 2022-12-28 116.25 118.15 115.51 115.62 115.62 19612500
2671 2022-12-29 116.40 121.03 115.77 120.26 120.26 22366200
2672 2022-12-30 118.16 120.42 117.74 120.34 120.34 19492100
I need to get rid of "Date 0, 1, 2, ..." column and make the actual "Date" column part of the (multi) index
How do I do this?
Use df.droplevel to delete level 1 and chain df.set_index to add column Date to the index by setting the append parameter to True.
df = df.droplevel(1).set_index('Date', append=True)
df
Open High Low Close Adj Close Volume
Ticker Date
AAPL 2010-01-04 7.62 7.66 7.59 7.64 6.51 493729600
2010-01-05 7.66 7.70 7.62 7.66 6.52 601904800

Future dates calculating incorrectly in FBProphet - make_future_dataframe method

I'm trying to do a weekly forecast in FBProphet for just 5 weeks ahead. The make_future_dataframe method doesn't seem to be working right....makes the correct one week intervals except for one week between jul 3 and Jul 5....every other interval is correct at 7 days or a week. Code and output below:
INPUT DATAFRAME
ds y
548 2010-01-01 3117
547 2010-01-08 2850
546 2010-01-15 2607
545 2010-01-22 2521
544 2010-01-29 2406
... ... ...
4 2020-06-05 2807
3 2020-06-12 2892
2 2020-06-19 3012
1 2020-06-26 3077
0 2020-07-03 3133
CODE
future = m.make_future_dataframe(periods=5, freq='W')
future.tail(9)
OUTPUT
ds
545 2020-06-12
546 2020-06-19
547 2020-06-26
548 2020-07-03
549 2020-07-05
550 2020-07-12
551 2020-07-19
552 2020-07-26
553 2020-08-02
All you need to do is create a dataframe with the dates you need for predict method. utilizing the make_future_dataframe method is not necessary.

Pandas - Group into 24-hour blocks, but not midnight-to-midnight

I have a time Series. I'd like to group into into blocks of 24-hour blocks, from 8am to 7:59am the next day. I know how to group by date, but I've tried and failed to handle this 8-hour offset using TimeGroupers and DateOffsets.
I think you can use Grouper with parameter base:
print df
date name
0 2015-06-13 00:21:25 1
1 2015-06-14 01:00:25 2
2 2015-06-14 02:54:48 3
3 2015-06-15 14:38:15 2
4 2015-06-15 15:29:28 1
print df.groupby(pd.Grouper(key='date', freq='24h', base=8)).sum()
name
date
2015-06-12 08:00:00 1.0
2015-06-13 08:00:00 5.0
2015-06-14 08:00:00 NaN
2015-06-15 08:00:00 3.0
alternatively to #jezrael's method you can use your custom grouper function:
start_ts = '2016-01-01 07:59:59'
df = pd.DataFrame({'Date': pd.date_range(start_ts, freq='10min', periods=1000)})
def my_grouper(df, idx):
return df.ix[idx, 'Date'].date() if df.ix[idx, 'Date'].hour >= 8 else df.ix[idx, 'Date'].date() - pd.Timedelta('1day')
df.groupby(lambda x: my_grouper(df, x)).size()
Test:
In [468]: df.head()
Out[468]:
Date
0 2016-01-01 07:59:59
1 2016-01-01 08:09:59
2 2016-01-01 08:19:59
3 2016-01-01 08:29:59
4 2016-01-01 08:39:59
In [469]: df.tail()
Out[469]:
Date
995 2016-01-08 05:49:59
996 2016-01-08 05:59:59
997 2016-01-08 06:09:59
998 2016-01-08 06:19:59
999 2016-01-08 06:29:59
In [470]: df.groupby(lambda x: my_grouper(df, x)).size()
Out[470]:
2015-12-31 1
2016-01-01 144
2016-01-02 144
2016-01-03 144
2016-01-04 144
2016-01-05 144
2016-01-06 144
2016-01-07 135
dtype: int64

subtraction in SQL giving incorrect value

I have a table that contains Id, Date and a float value as below:
ID startDt Days
1328 2015-04-01 00:00:00.000 15
2444 2015-04-03 00:00:00.000 5.7
1658 2015-05-08 00:00:00.000 6
1329 2015-05-12 00:00:00.000 28.5
1849 2015-06-23 00:00:00.000 28.5
1581 2015-06-30 00:00:00.000 25.5
3535 2015-07-03 00:00:00.000 3
3536 2015-08-13 00:00:00.000 13.5
2166 2015-09-22 00:00:00.000 28.5
3542 2015-11-05 00:00:00.000 13.5
3543 2015-12-18 00:00:00.000 6
2445 2015-12-25 00:00:00.000 5.7
4096 2015-12-31 00:00:00.000 7.5
2446 2016-01-01 00:00:00.000 5.7
4287 2016-02-11 00:00:00.000 13.5
4288 2016-02-18 00:00:00.000 13.5
4492 2016-03-02 00:00:00.000 19.7
2447 2016-03-25 00:00:00.000 5.7
I am using a stored procedure which adds up the Days then subtracts it from a fixed value stored in a variable.
The total in the table is 245 and the variable is set to 245 so I should get a value of 0 when subtracting the two. However, I am getting a value of 5.6843418860808E-14 instead. I cant figure out why this is the case and I have gone and re entered each number in the table but I still get the same result.
This is my sql statement that I am using to calculate the result:
Declare #AL_Taken as float
Declare #AL_Remaining as float
Declare #EntitledLeave as float
Set #EntitledLeave=245
set #AL_Taken= (select sum(Days) from tblALMain)
Set #AL_Remaining=#EntitledLeave-#AL_Taken
Select #EntitledLeave, #AL_Taken, #AL_Remaining
The select returns the following:
245, 245, 5.6843418860808E-14
Can anyone suggest why I am getting this number when I should be getting 0?
Thanks for the help
Rob
I changed the data type to Decimal as Tab Allenman suggested and this resolved my issue. I still dont understand why I didnt get zero when using float as all the values added up to 245 exactly (I even re-entered the values manually) and 245 - 245 should have given me 0.
Thanks again for all the comments and explanations.
Rob

How to create new Pandas Dataframe with columns form DataFrame (PYTHON)

I am creating a DataFrame from a csv file, where my index (rows) is date and my column names are names of cities.
After I create the raw DataFrame, I am trying to create a DataFrame from selected columns. I have tried:
A=df['city1'] #city 1
B=df['city2']
C=pd.merge(A,B)
but it does't work. This is what A and B look like.
Date
2013-11-01 2.56
2013-12-01 1.77
2014-01-01 0.00
2014-02-01 0.38
2014-03-01 13.16
2014-04-01 10.29
2014-05-01 15.43
2014-06-01 11.48
2014-07-01 8.54
2014-08-01 11.11
2014-09-01 2.71
2014-10-01 4.16
2014-11-01 13.01
2014-12-01 9.59
Name: Seattle.Washington, dtype: float64 Date
And this is what I am looking to create:
City1 City2
Date
2013-11-01 0.00 2.94
2013-12-01 8.26 3.41
2014-01-01 1.11 14.27
2014-02-01 32.86 84.26
2014-03-01 34.12 0.00
2014-04-01 68.39 0.00
2014-05-01 27.17 9.09
2014-06-01 10.47 32.00
2014-07-01 14.19 26.83
2014-08-01 14.91 6.36
2014-09-01 3.76 8.32
2014-10-01 5.83 2.19
2014-11-01 10.79 2.64
2014-12-01 21.24 8.08
Any suggestion?
Error Message:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-222-ec50ff9f372f> in <module>()
14 S = df['City1']
15 A = df['City2']
16
---> 17 print merge(S,A)
18 #df2=pd.merge(A,A)
19 #print df2
C:\...\merge.pyc in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy)
36 right_on=right_on, left_index=left_index,
37 right_index=right_index, sort=sort, suffixes=suffixes,
---> 38 copy=copy)
39 return op.get_result()
40 if __debug__:
Answer: (Courtesy of #EdChum)
df[['City1', 'City2']]