I'm trying to slice the last value of the series Change from my dataframe df.
The dataframe looks something like this
Change
0 1.000000
1 0.917727
2 1.000000
3 0.914773
4 0.933182
5 0.936136
6 0.957500
14466949 1.998392
14466950 2.002413
14466951 1.998392
14466952 1.974266
14466953 1.966224
When I input the following code
df.Change[0]
df.Change[100]
df.Change[100000]
I'm getting an output, but when I'm input
df.Change[-1]
I'm getting the following error
Traceback (most recent call last):
File "<pyshell#188>", line 1, in <module>
df.Change[-1]
File "C:\Python27\lib\site-packages\pandas\core\series.py", line 601, in __getitem__
result = self.index.get_value(self, key)
File "C:\Python27\lib\site-packages\pandas\indexes\base.py", line 2139, in get_value
tz=getattr(series.dtype, 'tz', None))
File "pandas/index.pyx", line 105, in pandas.index.IndexEngine.get_value (pandas\index.c:3338)
File "pandas/index.pyx", line 113, in pandas.index.IndexEngine.get_value (pandas\index.c:3041)
File "pandas/index.pyx", line 151, in pandas.index.IndexEngine.get_loc (pandas\index.c:3898)
KeyError: -1
Pretty much any negative number I use for slicing is resulting in an error, and I'm not exactly sure why.
Thanks.
There are several ways to do this. What's happening is that pandas has no issues with df.Change[100] because 100 is in its index. -1 is not. You happen to have your index the same as if you were using ordinal positions. To explicitly get ordinal positions, use iloc.
df.Change.iloc[-1]
or
df.Change.values[-1]
Related
I have to run multiple models for different geo regions and some geo regions might have gaps in the data too.
For instance, the data might look like below -
In here, I have data from 2019 for only 6 days and then 1 day in 2020 and rest from 2021.
Now, I want to run cross validation on this dataset. The total rows is 423.
from prophet.diagnostics import cross_validation as cv, performance_metrics as pm
import pandas as pd
df = pd.read_csv('..')
model = ProphetPos(holidays = holidays, **config.MODEL_PARAMS)
model.fit(df)
results = cv(model, f'182 days', initial=f'182', period='30 days', parallel='threads')
When I try to run the code, the cross-validation step gives an error -
File "/usr/local/lib/python3.8/site-packages/prophet/diagnostics.py", line 202, in cross_validation
return pd.concat(predicts, axis=0).reset_index(drop=True)
File "/usr/local/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 285, in concat
op = _Concatenator(
File "/usr/local/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 339, in __init__
objs = list(objs)
File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 619, in result_iterator
yield fs.pop().result()
File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 444, in result
return self.__get_result()
File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.8/site-packages/prophet/diagnostics.py", line 250, in single_cutoff_forecast
yhat = m.predict(df[index_predicted][columns])
File "/usr/local/lib/python3.8/site-packages/inf_forecast_colo/utils/prophet_forecaster.py", line 1682, in predict
if fcst['trend'].iloc[-1] < fcst['trend'].iloc[-2]:
File "/usr/local/lib/python3.8/site-packages/pandas/core/indexing.py", line 895, in __getitem__
return self._getitem_axis(maybe_callable, axis=axis)
File "/usr/local/lib/python3.8/site-packages/pandas/core/indexing.py", line 1501, in _getitem_axis
self._validate_integer(key, axis)
File "/usr/local/lib/python3.8/site-packages/pandas/core/indexing.py", line 1444, in _validate_integer
raise IndexError("single positional indexer is out-of-bounds")
IndexError: single positional indexer is out-of-bounds
I guess what is happening is the initial data is available for 5 days out of the 180 days but then the test set has 0 records to test the performance against. Hence it fails.
I am sorting data within a dataframe, I am using the data from one column, binning it to intervals of 2 units and then I want a list of each of the intervals created. This is the code:
df = gpd.read_file(file)
final = df[df['i_h100']>=0]
final['H_bins']=pd.cut(x=final['i_h100'], bins=np.arange(0, 120+2, 2))
HBins = list(np.unique(final['H_bins']))
With most of my dataframes this works totally fine but occasionally I get the following error:
Traceback (most recent call last):
File "/var/folders/bg/5n2pqdj53xv1lm099dt8z2lw0000gn/T/ipykernel_2222/1548505304.py", line 1, in <cell line: 1>
HBins = list(np.unique(final['H_bins']))
File "<__array_function__ internals>", line 180, in unique
File "/Users/heatherkay/miniconda3/envs/gpd/lib/python3.9/site-packages/numpy/lib/arraysetops.py", line 272, in unique
ret = _unique1d(ar, return_index, return_inverse, return_counts)
File "/Users/heatherkay/miniconda3/envs/gpd/lib/python3.9/site-packages/numpy/lib/arraysetops.py", line 333, in _unique1d
ar.sort()
TypeError: '<' not supported between instances of 'float' and 'pandas._libs.interval.Interval'
I don't understand why, or how to resolve this.
Trying to sum columns in Pandas dataframe, issue with index it seems...
Part of dataset looks like this, for multiple years:
snapshot of dataset
CA_HousingTrend = CA_HousingTrend_temp.pivot_table(index='YEAR',columns='UNITSSTR', aggfunc='size')
dataframe looks like this now and
this is the properties
Trying to sum multi-family units so I am specifying the columns to sum
cols = ['05', '06']
CA_HousingTrend['sum_stats'] = CA_HousingTrend[cols].sum(axis=1)
This is the error I get:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/code.py", line 90, in runcode
exec(code, self.locals)
File "", line 5, in
File "/Users/alexandramaxim/Documents/Py/lib/python3.10/site-packages/pandas/core/frame.py", line 3511, in getitem
indexer = self.columns._get_indexer_strict(key, "columns")1
File "/Users/alexandramaxim/Documents/Py/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 5782, in _get_indexer_strict
self._raise_if_missing(keyarr, indexer, axis_name)
File "/Users/alexandramaxim/Documents/Py/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 5842, in _raise_if_missing
raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [Index(['05', '06'], dtype='object', name='UNITSSTR')] are in the [columns]"
Not sure if you need the index, but the pivot probably created a multi-index. Try this.
CA_HousingTrend = CA_HousingTrend_temp.pivot_table(index='YEAR',columns='UNITSSTR', aggfunc='size')
# A new dataframe just so you have something new to play with.
new_df = CA_Housing_Trend_temp.reset_index()
I work with timestamps (having mixed DST values). Tried in Pandas 1.0.0:
s = pd.Series(
[pd.Timestamp('2020-02-01 11:35:44+01'),
np.nan, # same result with pd.Timestamp('nat')
pd.Timestamp('2019-04-13 12:10:20+02')])
Asking for min() or max() fails:
s.min(), s.max() # same result with s.min(skipna=True)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Anaconda\lib\site-packages\pandas\core\generic.py", line 11216, in stat_func
f, name, axis=axis, skipna=skipna, numeric_only=numeric_only
File "C:\Anaconda\lib\site-packages\pandas\core\series.py", line 3892, in _reduce
return op(delegate, skipna=skipna, **kwds)
File "C:\Anaconda\lib\site-packages\pandas\core\nanops.py", line 125, in f
result = alt(values, axis=axis, skipna=skipna, **kwds)
File "C:\Anaconda\lib\site-packages\pandas\core\nanops.py", line 837, in reduction
result = getattr(values, meth)(axis)
File "C:\Anaconda\lib\site-packages\numpy\core\_methods.py", line 34, in _amin
return umr_minimum(a, axis, None, out, keepdims, initial, where)
TypeError: '<=' not supported between instances of 'Timestamp' and 'float'
Workaround:
s.loc[s.notna()].min(), s.loc[s.notna()].max()
(Timestamp('2019-04-13 12:10:20+0200', tz='pytz.FixedOffset(120)'), Timestamp('2020-02-01 11:35:44+0100', tz='pytz.FixedOffset(60)'))
What I am missing here? Is it a bug?
I think problem here is pandas working with Series with different timezones like objects, so max and min here failed.
s = pd.Series(
[pd.Timestamp('2020-02-01 11:35:44+01'),
np.nan, # same result with pd.Timestamp('nat')
pd.Timestamp('2019-04-13 12:10:20+02')])
print (s)
0 2020-02-01 11:35:44+01:00
1 NaN
2 2019-04-13 12:10:20+02:00
dtype: object
So if convert to datetimes (but not with mixed timezones) it working well:
print (pd.to_datetime(s, utc=True))
0 2020-02-01 10:35:44+00:00
1 NaT
2 2019-04-13 10:10:20+00:00
dtype: datetime64[ns, UTC]
print (pd.to_datetime(s, utc=True).max())
2020-02-01 10:35:44+00:00
Another possible solution if need different timezones is:
print (s.dropna().max())
2020-02-01 11:35:44+01:00
Been running into this same error for 2 weeks, even though the code worked before. Not sure if I updated pandas as part of another library install, and maybe something changed there. Currently on version 23.4. Expected outcome is returning just the row with that identifier value.
In [42]: df.head()
Out[43]:
index Identifier ...
0 51384710 ...
1 74838J10 ...
2 80589M10 ...
3 67104410 ...
4 50241310 ...
[5 rows x 14 columns]
In [43]: df.loc[df.Identifier.isin(['51384710'])].head()
Traceback (most recent call last):
File "C:\anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2862, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-44-a3dbf43451ef>", line 1, in <module>
df.loc[df.Identifier.isin(['51384710'])].head()
File "C:\anaconda3\lib\site-packages\pandas\core\indexing.py", line 1478, in __getitem__
return self._getitem_axis(maybe_callable, axis=axis)
File "C:\anaconda3\lib\site-packages\pandas\core\indexing.py", line 1899, in _getitem_axis
raise ValueError('Cannot index with multidimensional key')
**ValueError: Cannot index with multidimensional key**
Code Snippet
Fixed it. I'd done df.columns = [column_list] where column_list = [...], which caused df to be treated as if it had a multiindex, even though there was only one level. removed the brackets from the df.columns assignment.
Try changing
df.loc[df.Identifier.isin(['51384710'])].head()
to
df[df.Identifier.isin(['51384710'])].head()