Facebook-Prophet: Overflow error when fitting

Facebook-Prophet: Overflow error when fitting - facebook-prophet

I wanted to practice with prophet so I decided to download the "Yearly mean total sunspot number [1700 - now]" data from this place
http://www.sidc.be/silso/datafiles#total.
This is my code so far
import numpy as np
import matplotlib.pyplot as plt
from fbprophet import Prophet
from fbprophet.plot import plot_plotly
import plotly.offline as py
import datetime
py.init_notebook_mode()
plt.style.use('classic')
df = pd.read_csv('SN_y_tot_V2.0.csv',delimiter=';', names = ['ds', 'y','C3', 'C4', 'C5'])
df = df.drop(columns=['C3', 'C4', 'C5'])
df.plot(x="ds", style='-',figsize=(10,5))
plt.xlabel('year',fontsize=15);plt.ylabel('mean number of sunspots',fontsize=15)
plt.xticks(np.arange(1701.5, 2018.5,40))
plt.ylim(-2,300);plt.xlim(1700,2020)
plt.legend()df['ds'] = pd.to_datetime(df.ds, format='%Y')
df['ds'] = pd.to_datetime(df.ds, format='%Y')
m = Prophet(yearly_seasonality=True)
Everything looks good so far and df['ds'] is in date time format.
However when I execute
m.fit(df)
I get the following error
---------------------------------------------------------------------------
OverflowError Traceback (most recent call last)
<ipython-input-57-a8e399fdfab2> in <module>()
----> 1 m.fit(df)
/anaconda2/envs/mde/lib/python3.7/site-packages/fbprophet/forecaster.py in fit(self, df, **kwargs)
1055 self.history_dates = pd.to_datetime(df['ds']).sort_values()
1056
-> 1057 history = self.setup_dataframe(history, initialize_scales=True)
1058 self.history = history
1059 self.set_auto_seasonalities()
/anaconda2/envs/mde/lib/python3.7/site-packages/fbprophet/forecaster.py in setup_dataframe(self, df, initialize_scales)
286 df['cap_scaled'] = (df['cap'] - df['floor']) / self.y_scale
287
--> 288 df['t'] = (df['ds'] - self.start) / self.t_scale
289 if 'y' in df:
290 df['y_scaled'] = (df['y'] - df['floor']) / self.y_scale
/anaconda2/envs/mde/lib/python3.7/site-packages/pandas/core/ops/__init__.py in wrapper(left, right)
990 # test_dt64_series_add_intlike, which the index dispatching handles
991 # specifically.
--> 992 result = dispatch_to_index_op(op, left, right, pd.DatetimeIndex)
993 return construct_result(
994 left, result, index=left.index, name=res_name, dtype=result.dtype
/anaconda2/envs/mde/lib/python3.7/site-packages/pandas/core/ops/__init__.py in dispatch_to_index_op(op, left, right, index_class)
628 left_idx = left_idx._shallow_copy(freq=None)
629 try:
--> 630 result = op(left_idx, right)
631 except NullFrequencyError:
632 # DatetimeIndex and TimedeltaIndex with freq == None raise ValueError
/anaconda2/envs/mde/lib/python3.7/site-packages/pandas/core/indexes/datetimelike.py in __sub__(self, other)
521 def __sub__(self, other):
522 # dispatch to ExtensionArray implementation
--> 523 result = self._data.__sub__(maybe_unwrap_index(other))
524 return wrap_arithmetic_op(self, other, result)
525
/anaconda2/envs/mde/lib/python3.7/site-packages/pandas/core/arrays/datetimelike.py in __sub__(self, other)
1278 result = self._add_offset(-other)
1279 elif isinstance(other, (datetime, np.datetime64)):
-> 1280 result = self._sub_datetimelike_scalar(other)
1281 elif lib.is_integer(other):
1282 # This check must come after the check for np.timedelta64
/anaconda2/envs/mde/lib/python3.7/site-packages/pandas/core/arrays/datetimes.py in _sub_datetimelike_scalar(self, other)
856
857 i8 = self.asi8
--> 858 result = checked_add_with_arr(i8, -other.value, arr_mask=self._isnan)
859 result = self._maybe_mask_results(result)
860 return result.view("timedelta64[ns]")
/anaconda2/envs/mde/lib/python3.7/site-packages/pandas/core/algorithms.py in checked_add_with_arr(arr, b, arr_mask, b_mask)
1006
1007 if to_raise:
-> 1008 raise OverflowError("Overflow in int64 addition")
1009 return arr + b
1010
OverflowError: Overflow in int64 addition```
I understand that there's an issue with 'ds', but I am not sure whether there is something wring with the column's format or an open issue.
Does anyone have any idea how to fix this? I have checked some issues in github, but they haven't been of much help in this case.
Thanks

This is not an answer to fix the issue, but how to avoid the error.
I got the same error, and manage to get rid of the error when I reduce the number of data that is coming in OR when I reduce the horizon span of the forecast.
For example, I limit my training data to only start since 1825 meanwhile I have data from the year of 1700s. I also tried to limit my forecast days from 10 years forecast to only 1 year. Both managed to get rid of the error.
My guess this problem has something to do with how the ARIMA is implemented inside the Prophet itself which in some cases the number is just to huge to be managed by int64 and become overflow.

Related

Iterating Rows in DataFrame and Applying difflib.ratio()

Context of Problem
I am working on a project where I would like to compare two columns from a dataframe to determine what percent of the strings are similar to each other. Specifically, I'm comparing whether bullets scraped from retailer websites match the bullets that I expect to see on those sites for a given product.
I know that I can simply use boolean logic to determine if the value from column ['X'] == column ['Y']. But I'd like to take it to another level and determine what percentage of X matches Y. I did some research and found that difflib.ratio() can accomplish what I want.
Example of difflib.ratio()
a = 'preview'
b = 'previeu'
SequenceMatcher(a=a, b=b).ratio()
My Use Case
Where I'm having trouble is applying this logic to iterate through a DataFrame. This is what my DataFrame looks like.
DataFrame
The DataFrame has 5 "Bullets" and 5 "SEO Bullets". So I tried using a for loop to apply a lambda function to my DataFrame called test.
for x in range(1,6):
test[f'Bullet {x} Ratio'] = test.apply(lambda row: SequenceMatcher(a=row[f'SeoBullet_{x}'], b=row[f'Bullet {x}']).ratio())
But I received the following error:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-409-39a6ba3c8879> in <module>
1 for x in range(1,6):
----> 2 test[f'Bullet {x} Ratio'] = test.apply(lambda row: SequenceMatcher(a=row[f'SeoBullet_{x}'], b=row[f'Bullet {x}']).ratio())
~\AppData\Local\Programs\PythonCodingPack\lib\site-packages\pandas\core\frame.py in apply(self, func, axis, raw, result_type, args, **kwds)
7539 kwds=kwds,
7540 )
-> 7541 return op.get_result()
7542
7543 def applymap(self, func) -> "DataFrame":
~\AppData\Local\Programs\PythonCodingPack\lib\site-packages\pandas\core\apply.py in get_result(self)
178 return self.apply_raw()
179
--> 180 return self.apply_standard()
181
182 def apply_empty_result(self):
~\AppData\Local\Programs\PythonCodingPack\lib\site-packages\pandas\core\apply.py in apply_standard(self)
253
254 def apply_standard(self):
--> 255 results, res_index = self.apply_series_generator()
256
257 # wrap results
~\AppData\Local\Programs\PythonCodingPack\lib\site-packages\pandas\core\apply.py in apply_series_generator(self)
282 for i, v in enumerate(series_gen):
283 # ignore SettingWithCopy here in case the user mutates
--> 284 results[i] = self.f(v)
285 if isinstance(results[i], ABCSeries):
286 # If we have a view on v, we need to make a copy because
<ipython-input-409-39a6ba3c8879> in <lambda>(row)
1 for x in range(1,6):
----> 2 test[f'Bullet {x} Ratio'] = test.apply(lambda row: SequenceMatcher(a=row[f'SeoBullet_{x}'], b=row[f'Bullet {x}']).ratio())
~\AppData\Local\Programs\PythonCodingPack\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
880
881 elif key_is_scalar:
--> 882 return self._get_value(key)
883
884 if (
~\AppData\Local\Programs\PythonCodingPack\lib\site-packages\pandas\core\series.py in _get_value(self, label, takeable)
989
990 # Similar to Index.get_value, but we do not fall back to positional
--> 991 loc = self.index.get_loc(label)
992 return self.index._get_values_for_loc(self, loc, label)
993
~\AppData\Local\Programs\PythonCodingPack\lib\site-packages\pandas\core\indexes\range.py in get_loc(self, key, method, tolerance)
352 except ValueError as err:
353 raise KeyError(key) from err
--> 354 raise KeyError(key)
355 return super().get_loc(key, method=method, tolerance=tolerance)
356
KeyError: 'SeoBullet_1'
Desired Output
Ideally, the final output would be a dataframe that has 5 additional columns with the ratios for each Bullet comparison.
I'm still new-ish to Python, so I could just naïve and missing something very obvious. I say this also to say that if there is another route I could go to accomplish the same thing (or something very similar) I am open to those suggestions.

Sum n values in numpy array based on pandas index

I am trying to calculate the cumulative sum of the first n values in a numpy array, where n is a value in each row of a pandas dataframe. I have set up a little example problem with a single column and it works fine, but it does not work when I have more than one column.
Example problem that fails:
a=np.ones((10,))
df=pd.DataFrame([[4.,2],[6.,1],[5.,2.]],columns=['nj','ni'])
df['nj']=df['nj'].astype(int)
df['nsum']=df.apply(lambda x: np.sum(a[:x['nj']]),axis=1)
df
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_23612/1905114001.py in <module>
2 df=pd.DataFrame([[4.,2],[6.,1],[5.,2.]],columns=['nj','ni'])
3 df['nj']=df['nj'].astype(int)
----> 4 df['nsum']=df.apply(lambda x: np.sum(a[:x['nj']]),axis=1)
5 df
C:\ProgramData\Anaconda3\envs\py37\lib\site-packages\pandas\core\frame.py in apply(self, func, axis, raw, result_type, args, **kwds)
7766 kwds=kwds,
7767 )
-> 7768 return op.get_result()
7769
7770 def applymap(self, func, na_action: Optional[str] = None) -> DataFrame:
C:\ProgramData\Anaconda3\envs\py37\lib\site-packages\pandas\core\apply.py in get_result(self)
183 return self.apply_raw()
184
--> 185 return self.apply_standard()
186
187 def apply_empty_result(self):
C:\ProgramData\Anaconda3\envs\py37\lib\site-packages\pandas\core\apply.py in apply_standard(self)
274
275 def apply_standard(self):
--> 276 results, res_index = self.apply_series_generator()
277
278 # wrap results
C:\ProgramData\Anaconda3\envs\py37\lib\site-packages\pandas\core\apply.py in apply_series_generator(self)
288 for i, v in enumerate(series_gen):
289 # ignore SettingWithCopy here in case the user mutates
--> 290 results[i] = self.f(v)
291 if isinstance(results[i], ABCSeries):
292 # If we have a view on v, we need to make a copy because
~\AppData\Local\Temp/ipykernel_23612/1905114001.py in <lambda>(x)
2 df=pd.DataFrame([[4.,2],[6.,1],[5.,2.]],columns=['nj','ni'])
3 df['nj']=df['nj'].astype(int)
----> 4 df['nsum']=df.apply(lambda x: np.sum(a[:x['nj']]),axis=1)
5 df
TypeError: slice indices must be integers or None or have an __index__ method
Example problem that works:
a=np.ones((10,))
df=pd.DataFrame([4.,6.,5.],columns=['nj'])
df['nj']=df['nj'].astype(int)
df['nsum']=df.apply(lambda x: np.sum(a[:x['nj']]),axis=1)
df
nj nsum
0 4 4.0
1 6 6.0
2 5 5.0
In both cases:
print(a.shape)
print(a.dtype)
print(type(df))
print(df['nj'].dtype)
(10,)
float64
<class 'pandas.core.frame.DataFrame'>
int32
A work around that is not very satisfying, especially because I would eventually like to use multiple columns in the lambda function, is:
tmp=pd.DataFrame(df['nj'])
df['nsum'] = tmp.apply(lambda x: np.sum(delr[:x['nj']]),axis=1)
Any clarification on what I have missed here or better work arounds?

IIUC, you can do it in numpy with numpy.take and numpy.cumsum:
np.take(np.cumsum(a, axis=0), df['nj'], axis=0)

A small adjustment to pass just the column of interest (df['nj']) to lambda solved my initial issue:
df['nsum'] = df['nj'].apply(lambda x: np.sum(a[:x]))
Using mozway's suggestion of np.take and np.cumsum along with a less ambiguous(?) example, the following will also work (but note the x-1 since the initial problem states "the cumulative sum of the first n values" rather than the cumulative sum to index n):
a=np.array([3,2,4,5,1,2,3])
df=pd.DataFrame([[4.,2],[6.,1],[5.,3.]],columns=['nj','ni'])
df['nj']=df['nj'].astype(int)
df[['nsumj']]=df['nj'].apply(lambda x: np.take(np.cumsum(a),x-1))
#equivalent?
# df[['nsumj']]=df['nj'].apply(lambda x: np.cumsum(a)[x-1])
print(a)
print(df)
Output:
[3 2 4 5 1 2 3]
nj ni nsumj
0 4 2.0 14
1 6 1.0 17
2 5 3.0 15
From the example here it seems the key to using multiple columns in the funtion (the next issue I was running into and hinted at) is to unpack the columns, so I will put this here incase it helps anyone:
df['nprod']=df[['ni','nj']].apply(lambda x: np.multiply(*x),axis=1)

How do I filter by date and count these dates using relational algebra (Reframe)?

I'm really stuck. I read the Reframe documentation https://reframe.readthedocs.io/en/latest/ that's based on pandas and I tried multiple things on my own but it still doesn't work. So I got a CSV file called weddings that looks like this:
Weddings, Date
Wedding1,20181107
Wedding2,20181107
And many more rows. As you can see, there are duplicates in the date column but this doesn't matter I think. I want to count the amount of weddings filtered by date, for example, the amount of weddings after 5 october 2016 (20161005). So first I tried this:
Weddings = Relation('weddings.csv')
Weddings.sort(['Date']).query('Date > 20161005').project(['Weddings', 'Date'])
This seems logical to me but I get a Keyerror 'Date' and don't know why? So I tried something more simple
Weddings = Relation('weddings.csv')
Weddings.groupby(['Date']).count()
And this doesn't work either, I still get a Keyerror 'Date' and don't know why. Can someone help me?
Track trace
KeyError Traceback (most recent call last)
<ipython-input-44-b358cdf55fdb> in <module>()
1
2 Weddings = Relation('weddings.csv')
----> 3 weddings.sort(['Date']).query('Date > 20161005').project(['Weddings', 'Date'])
4
5
~\Documents\Reframe.py in sort(self, *args,
**kwargs)
110 """
111
--> 112 return Relation(super().sort_values(*args, **kwargs))
113
114 def intersect(self, other):
~\Anaconda3\lib\site-packages\pandas\core\frame.py in sort_values(self, by,
axis, ascending, inplace, kind, na_position)
4416 by = by[0]
4417 k = self._get_label_or_level_values(by, axis=axis,
-> 4418 stacklevel=stacklevel)
4419
4420 if isinstance(ascending, (tuple, list)):
~\Anaconda3\lib\site-packages\pandas\core\generic.py in
_get_label_or_level_values(self, key, axis, stacklevel)
1377 values = self.axes[axis].get_level_values(key)._values
1378 else:
-> 1379 raise KeyError(key)
1380
1381 # Check for duplicates
KeyError: 'Date'

Using set_index in time series to eliminate holiday data rows from DataFrame

I am trying to eliminate holiday data from a time series pandas DataFrame. The instructions I am following processes a DatetimeSeries and uses the function set_index() to apply this DatetimeSeries to the DataFrame which results in a time series without the holidays. This set_index() function is not working for me. Check out the code...
{data_day.tail()}
Open High Low Close Volume
Date
2018-05-20 NaN NaN NaN NaN 0.0
2018-05-21 2732.50 2739.25 2725.25 2730.50 210297692.0
2018-05-22 2726.00 2741.75 2721.50 2738.25 179224835.0
2018-05-23 2731.75 2732.75 2708.50 2710.50 292305588.0
2018-05-24 2726.00 2730.50 2705.75 2725.00 312575571.0
from pandas.tseries.holiday import USFederalHolidayCalendar
from pandas.tseries.offsets import CustomBusinessDay
usb = CustomBusinessDay(calendar=USFederalHolidayCalendar())
usb
<CustomBusinessDay>
data_day_No_Holiday = pd.date_range(start='9/7/2005', end='5/21/2018', freq=usb)
data_day_No_Holiday
DatetimeIndex(['2005-09-07', '2005-09-08', '2005-09-09', '2005-09-12',
'2005-09-13', '2005-09-14', '2005-09-15', '2005-09-16',
'2005-09-19', '2005-09-20',
...
'2018-05-08', '2018-05-09', '2018-05-10', '2018-05-11',
'2018-05-14', '2018-05-15', '2018-05-16', '2018-05-17',
'2018-05-18', '2018-05-21'],
dtype='datetime64[ns]', length=3187, freq='C')
data_day.set_index(data_day_No_Holidays, inplace=True)
----------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-118-cf7521d08f6f> in <module>()
----> 1 data_day.set_index(data_day_No_Holidays, inplace=True)
2 # inplace=True tells python to modify the original df and to NOT create a new one.
~/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py in set_index(self, keys, drop, append, inplace, verify_integrity)
3923 index._cleanup()
3924
-> 3925 frame.index = index
3926
3927 if not inplace:
~/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py in __setattr__(self, name, value)
4383 try:
4384 object.__getattribute__(self, name)
-> 4385 return object.__setattr__(self, name, value)
4386 except AttributeError:
4387 pass
pandas/_libs/properties.pyx in pandas._libs.properties.AxisProperty.__set__()
~/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py in _set_axis(self, axis, labels)
643
644 def _set_axis(self, axis, labels):
--> 645 self._data.set_axis(axis, labels)
646 self._clear_item_cache()
647
~/anaconda3/lib/python3.6/site-packages/pandas/core/internals.py in set_axis(self, axis, new_labels)
3321 raise ValueError(
3322 'Length mismatch: Expected axis has {old} elements, new '
-> 3323 'values have {new} elements'.format(old=old_len, new=new_len))
3324
3325 self.axes[axis] = new_labels
ValueError: Length mismatch: Expected axis has 4643 elements, new values have 3187 elements
This process seemed to work beautifully for another programmer.
Can anyone suggestion a datatype conversion or a function that will apply the DatetimeIndex to the DataFrame that will result in dropping all datarows (holidays) that are NOT represented in the data_day_No_Holiday DatetimeIndex?
Thanks, Let me know if I made any formatting errors or if I am leaving out any relevant information...

Use reindex:
from pandas.tseries.holiday import USFederalHolidayCalendar
from pandas.tseries.offsets import CustomBusinessDay
usb = CustomBusinessDay(calendar=USFederalHolidayCalendar())
data_day_No_Holiday = pd.date_range(start='1/1/2018', end='12/31/2018', freq=usb)
data_day = pd.DataFrame({'Values':np.random.randint(0,100,365)},index = pd.date_range('2018-01-01', periods=365, freq='D'))
data_day.reindex(data_day_No_Holiday).dropna()'
Output(head):
Values
2018-01-02 38
2018-01-03 1
2018-01-04 16
2018-01-05 43
2018-01-08 95

pandas.io.ga not working for me

So I have worked through the Hello Analytics tutorial to confirm that OAuth2 is working as expected for me, but I'm not having any luck with the pandas.io.ga module. In particular, I am stuck with this error:
In [1]: from pandas.io import ga
In [2]: df = ga.read_ga("pageviews", "pagePath", "2014-07-08")
/usr/local/lib/python2.7/dist-packages/pandas/core/index.py:1162: FutureWarning: using '-' to provide set differences
with Indexes is deprecated, use .difference()
"use .difference()",FutureWarning)
/usr/local/lib/python2.7/dist-packages/pandas/core/index.py:1147: FutureWarning: using '+' to provide set union with
Indexes is deprecated, use '|' or .union()
"use '|' or .union()",FutureWarning)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-2-b5343faf9ae6> in <module>()
----> 1 df = ga.read_ga("pageviews", "pagePath", "2014-07-08")
/usr/local/lib/python2.7/dist-packages/pandas/io/ga.pyc in read_ga(metrics, dimensions, start_date, **kwargs)
105 reader = GAnalytics(**reader_kwds)
106 return reader.get_data(metrics=metrics, start_date=start_date,
--> 107 dimensions=dimensions, **kwargs)
108
109
/usr/local/lib/python2.7/dist-packages/pandas/io/ga.pyc in get_data(self, metrics, start_date, end_date, dimensions,
segment, filters, start_index, max_results, index_col, parse_dates, keep_date_col, date_parser, na_values, converters,
sort, dayfirst, account_name, account_id, property_name, property_id, profile_name, profile_id, chunksize)
293
294 if chunksize is None:
--> 295 return _read(start_index, max_results)
296
297 def iterator():
/usr/local/lib/python2.7/dist-packages/pandas/io/ga.pyc in _read(start, result_size)
287 dayfirst=dayfirst,
288 na_values=na_values,
--> 289 converters=converters, sort=sort)
290 except HttpError as inst:
291 raise ValueError('Google API error %s: %s' % (inst.resp.status,
/usr/local/lib/python2.7/dist-packages/pandas/io/ga.pyc in _parse_data(self, rows, col_info, index_col, parse_dates,
keep_date_col, date_parser, dayfirst, na_values, converters, sort)
313 keep_date_col=keep_date_col,
314 converters=converters,
--> 315 header=None, names=col_names))
316
317 if isinstance(sort, bool) and sort:
/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.pyc in _read(filepath_or_buffer, kwds)
237
238 # Create the parser.
--> 239 parser = TextFileReader(filepath_or_buffer, **kwds)
240
241 if (nrows is not None) and (chunksize is not None):
/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.pyc in __init__(self, f, engine, **kwds)
551 self.options['has_index_names'] = kwds['has_index_names']
552
--> 553 self._make_engine(self.engine)
554
555 def _get_options_with_defaults(self, engine):
/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.pyc in _make_engine(self, engine)
694 elif engine == 'python-fwf':
695 klass = FixedWidthFieldParser
--> 696 self._engine = klass(self.f, **self.options)
697
698 def _failover_to_python(self):
/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.pyc in __init__(self, f, **kwds)
1412 if not self._has_complex_date_col:
1413 (index_names,
-> 1414 self.orig_names, self.columns) = self._get_index_name(self.columns)
1415 self._name_processed = True
1416 if self.index_names is None:
/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.pyc in _get_index_name(self, columns)
1886 # Case 2
1887 (index_name, columns_,
-> 1888 self.index_col) = _clean_index_names(columns, self.index_col)
1889
1890 return index_name, orig_names, columns
/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.pyc in _clean_index_names(columns, index_col)
2171 break
2172 else:
-> 2173 name = cp_cols[c]
2174 columns.remove(name)
2175 index_names.append(name)
TypeError: list indices must be integers, not Index
OAuth2 is working as expected and I have only used these parameters as demo variables--the query itself is junk. Basically, I cannot figure out where the error is coming from, and would appreciate any pointers that one may have.
Thanks!
SOLUTION (SORT OF)
Not sure if this has to do with the data I'm trying to access or what, but the offending Index type error I'm getting arises from the the index_col variable in pandas.io.ga.GDataReader.get_data() is of type pandas.core.index.Index. This is fed to pandas.io.parsers._read() in _parse_data() which falls over. I don't understand this, but it is the breaking point for me.
As a fix--if anyone else is having this problem--I have edited line 270 of ga.py to:
index_col = _clean_index(list(dimensions), parse_dates).tolist()
and everything is now smooth as butter, but I suspect this may break things in other situations...

Unfortunately, this module isn't really documented and the errors aren't always meaningful. Include your account_name, property_name and profile_name (profile_name is the View in the online version). Then include some dimensions and metrics you are interested in. Also make sure that the client_secrets.json is in the pandas.io directory. An example:
ga.read_ga(account_name=account_name,
property_name=property_name,
profile_name=profile_name,
dimensions=['date', 'hour', 'minute'],
metrics=['pageviews'],
start_date=start_date,
end_date=end_date,
index_col=0,
parse_dates={'datetime': ['date', 'hour', 'minute']},
date_parser=lambda x: datetime.strptime(x, '%Y%m%d %H %M'),
max_results=max_results)
Also have a look at my recent step by step blog post about GA with pandas.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Facebook-Prophet: Overflow error when fitting - facebook-prophet

Related

Iterating Rows in DataFrame and Applying difflib.ratio()

Sum n values in numpy array based on pandas index

How do I filter by date and count these dates using relational algebra (Reframe)?

Using set_index in time series to eliminate holiday data rows from DataFrame

pandas.io.ga not working for me

Categories

Resources