Building MultiGraph from pandas dataframe - "TypeError: unhashable type: 'dict'" - pandas

I am experiencing the same issue, as it is described here
Networkx Multigraph from_pandas_dataframe
Although I replaced line 211 in convert_matrix.py, "TypeError: unhashable type: 'dict'" still exists. I want to build a MultiGraph using the following dataframe (links):
1_id f v v_id_1 v_id_2
0 3483 50 38000 739 2232
1 3482 50 38000 717 2196
2 3482 50 22000 717 2196
3 3480 50 22000 1058 2250
data = {'1_id':[3483, 3482, 3482, 3480], 'v_id_1':[739, 717, 717, 1058], 'v_id_2':[2232,2196, 2196, 2250], 'v':[38000, 38000, 22000, 22000], 'f':[50, 50, 50, 50]}
links = pd.DataFrame(data)
G=nx.from_pandas_dataframe(links, 'v_id_1', 'v_id_2', edge_attr=['v','f'], create_using=nx.MultiGraph())
Trying to create the MultiGraph I'm getting the error:
TypeError Traceback (most recent call last)
<ipython-input-49-d2c7b8312ea7> in <module>()
----> 1 MG= nx.from_pandas_dataframe(df, 'gene1', 'gene2', ['conf','type'], create_using=nx.MultiGraph())
/usr/lib/python2.7/site-packages/networkx-1.10-py2.7.egg/networkx/convert_matrix.pyc in from_pandas_dataframe(df, source, target, edge_attr, create_using)
209 # Iteration on values returns the rows as Numpy arrays
210 for row in df.values:
--> 211 g.add_edge(row[src_i], row[tar_i], {i:row[j] for i, j in edge_i})
212
213 # If no column names are given, then just return the edges.
/usr/lib/python2.7/site-packages/networkx-1.10-py2.7.egg/networkx/classes/multigraph.pyc in add_edge(self, u, v, key, attr_dict, **attr)
340 datadict.update(attr_dict)
341 keydict = self.edge_key_dict_factory()
--> 342 keydict[key] = datadict
343 self.adj[u][v] = keydict
344 self.adj[v][u] = keydict
TypeError: unhashable type: 'dict'

After posting this issue in GitHub (see here link), I got a good answer, which at least in my case seems to work. I had installed networkx 1.11 insted of version 2.0.dev_20161206165920 Try to install the development version of NetworkX from github link

Related

Sum n values in numpy array based on pandas index

I am trying to calculate the cumulative sum of the first n values in a numpy array, where n is a value in each row of a pandas dataframe. I have set up a little example problem with a single column and it works fine, but it does not work when I have more than one column.
Example problem that fails:
a=np.ones((10,))
df=pd.DataFrame([[4.,2],[6.,1],[5.,2.]],columns=['nj','ni'])
df['nj']=df['nj'].astype(int)
df['nsum']=df.apply(lambda x: np.sum(a[:x['nj']]),axis=1)
df
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_23612/1905114001.py in <module>
2 df=pd.DataFrame([[4.,2],[6.,1],[5.,2.]],columns=['nj','ni'])
3 df['nj']=df['nj'].astype(int)
----> 4 df['nsum']=df.apply(lambda x: np.sum(a[:x['nj']]),axis=1)
5 df
C:\ProgramData\Anaconda3\envs\py37\lib\site-packages\pandas\core\frame.py in apply(self, func, axis, raw, result_type, args, **kwds)
7766 kwds=kwds,
7767 )
-> 7768 return op.get_result()
7769
7770 def applymap(self, func, na_action: Optional[str] = None) -> DataFrame:
C:\ProgramData\Anaconda3\envs\py37\lib\site-packages\pandas\core\apply.py in get_result(self)
183 return self.apply_raw()
184
--> 185 return self.apply_standard()
186
187 def apply_empty_result(self):
C:\ProgramData\Anaconda3\envs\py37\lib\site-packages\pandas\core\apply.py in apply_standard(self)
274
275 def apply_standard(self):
--> 276 results, res_index = self.apply_series_generator()
277
278 # wrap results
C:\ProgramData\Anaconda3\envs\py37\lib\site-packages\pandas\core\apply.py in apply_series_generator(self)
288 for i, v in enumerate(series_gen):
289 # ignore SettingWithCopy here in case the user mutates
--> 290 results[i] = self.f(v)
291 if isinstance(results[i], ABCSeries):
292 # If we have a view on v, we need to make a copy because
~\AppData\Local\Temp/ipykernel_23612/1905114001.py in <lambda>(x)
2 df=pd.DataFrame([[4.,2],[6.,1],[5.,2.]],columns=['nj','ni'])
3 df['nj']=df['nj'].astype(int)
----> 4 df['nsum']=df.apply(lambda x: np.sum(a[:x['nj']]),axis=1)
5 df
TypeError: slice indices must be integers or None or have an __index__ method
Example problem that works:
a=np.ones((10,))
df=pd.DataFrame([4.,6.,5.],columns=['nj'])
df['nj']=df['nj'].astype(int)
df['nsum']=df.apply(lambda x: np.sum(a[:x['nj']]),axis=1)
df
nj nsum
0 4 4.0
1 6 6.0
2 5 5.0
In both cases:
print(a.shape)
print(a.dtype)
print(type(df))
print(df['nj'].dtype)
(10,)
float64
<class 'pandas.core.frame.DataFrame'>
int32
A work around that is not very satisfying, especially because I would eventually like to use multiple columns in the lambda function, is:
tmp=pd.DataFrame(df['nj'])
df['nsum'] = tmp.apply(lambda x: np.sum(delr[:x['nj']]),axis=1)
Any clarification on what I have missed here or better work arounds?
IIUC, you can do it in numpy with numpy.take and numpy.cumsum:
np.take(np.cumsum(a, axis=0), df['nj'], axis=0)
A small adjustment to pass just the column of interest (df['nj']) to lambda solved my initial issue:
df['nsum'] = df['nj'].apply(lambda x: np.sum(a[:x]))
Using mozway's suggestion of np.take and np.cumsum along with a less ambiguous(?) example, the following will also work (but note the x-1 since the initial problem states "the cumulative sum of the first n values" rather than the cumulative sum to index n):
a=np.array([3,2,4,5,1,2,3])
df=pd.DataFrame([[4.,2],[6.,1],[5.,3.]],columns=['nj','ni'])
df['nj']=df['nj'].astype(int)
df[['nsumj']]=df['nj'].apply(lambda x: np.take(np.cumsum(a),x-1))
#equivalent?
# df[['nsumj']]=df['nj'].apply(lambda x: np.cumsum(a)[x-1])
print(a)
print(df)
Output:
[3 2 4 5 1 2 3]
nj ni nsumj
0 4 2.0 14
1 6 1.0 17
2 5 3.0 15
From the example here it seems the key to using multiple columns in the funtion (the next issue I was running into and hinted at) is to unpack the columns, so I will put this here incase it helps anyone:
df['nprod']=df[['ni','nj']].apply(lambda x: np.multiply(*x),axis=1)

I cant retrieve footprints from place information

enter code hereWhen I try to retrieve footprints from place name using
import osmx as ox
tags = {'building': True}
gdf = ox.geometries_from_place('Piedmont, California, USA', tags)
I get the following error message:
IllegalArgumentException: Argument must be Polygonal or LinearRing
PredicateError: Failed to evaluate <_FuncPtr object at 0x13a2ea120>
In the past, I have successfully used the old version to retrieve footprints ox.footprints_from_place(). However this does not work anymore, and neither does the new method. Does anybody had the same issues with the new version (1.0.1) of the osmnx package?
Due to stackoverflow restrictions I can't post the complete traceback message. It seems that osmnx does not create the required polygon. The first error entries are:
---------------------------------------------------------------------------
PredicateError Traceback (most recent call last)
<ipython-input-13-98877af3189c> in <module>
1 import osmnx as ox
2 tags = {'building': True}
----> 3 gdf = ox.geometries_from_place('Piedmont, California, USA', tags)
/opt/anaconda3/envs/gerdaenv/lib/python3.7/site-packages/osmnx/geometries.py in geometries_from_place(query, tags, which_result, buffer_dist)
214
215 # create GeoDataFrame using this polygon(s) geometry
--> 216 gdf = geometries_from_polygon(polygon, tags)
217
218 return gdf
/opt/anaconda3/envs/gerdaenv/lib/python3.7/site-packages/osmnx/geometries.py in geometries_from_polygon(polygon, tags)
264
265 # create GeoDataFrame from the downloaded data
--> 266 gdf = _create_gdf(response_jsons, polygon, tags)
267
268 return gdf
/opt/anaconda3/envs/gerdaenv/lib/python3.7/site-packages/osmnx/geometries.py in _create_gdf(response_jsons, polygon, tags)
428
429 # Apply .buffer(0) to any invalid geometries
--> 430 gdf = _buffer_invalid_geometries(gdf)
431
432 # Filter final gdf to requested tags and query polygon
/opt/anaconda3/envs/gerdaenv/lib/python3.7/site-packages/osmnx/geometries.py in _buffer_invalid_geometries(gdf)
891
892 # create a filter for rows with invalid geometries
--> 893 invalid_geometry_filter = ~gdf["geometry"].is_valid
894
895 # if there are invalid geometries
/opt/anaconda3/envs/gerdaenv/lib/python3.7/site-packages/geopandas/base.py in is_valid(self)
168 """Returns a ``Series`` of ``dtype('bool')`` with value ``True`` for
169 geometries that are valid."""
--> 170 return _delegate_property("is_valid", self)
171
172 #property
The last traceback messages are :
/opt/anaconda3/envs/gerdaenv/lib/python3.7/site-packages/shapely/predicates.py in __call__(self, this)
23 def __call__(self, this):
24 self._validate(this)
---> 25 return self.fn(this._geom)
/opt/anaconda3/envs/gerdaenv/lib/python3.7/site-packages/shapely/geos.py in errcheck_predicate(result, func, argtuple)
582 """Result is 2 on exception, 1 on True, 0 on False"""
583 if result == 2:
--> 584 raise PredicateError("Failed to evaluate %s" % repr(func))
585 return result
586
PredicateError: Failed to evaluate <_FuncPtr object at 0x13a2ea120>

Facebook-Prophet: Overflow error when fitting

I wanted to practice with prophet so I decided to download the "Yearly mean total sunspot number [1700 - now]" data from this place
http://www.sidc.be/silso/datafiles#total.
This is my code so far
import numpy as np
import matplotlib.pyplot as plt
from fbprophet import Prophet
from fbprophet.plot import plot_plotly
import plotly.offline as py
import datetime
py.init_notebook_mode()
plt.style.use('classic')
df = pd.read_csv('SN_y_tot_V2.0.csv',delimiter=';', names = ['ds', 'y','C3', 'C4', 'C5'])
df = df.drop(columns=['C3', 'C4', 'C5'])
df.plot(x="ds", style='-',figsize=(10,5))
plt.xlabel('year',fontsize=15);plt.ylabel('mean number of sunspots',fontsize=15)
plt.xticks(np.arange(1701.5, 2018.5,40))
plt.ylim(-2,300);plt.xlim(1700,2020)
plt.legend()df['ds'] = pd.to_datetime(df.ds, format='%Y')
df['ds'] = pd.to_datetime(df.ds, format='%Y')
m = Prophet(yearly_seasonality=True)
Everything looks good so far and df['ds'] is in date time format.
However when I execute
m.fit(df)
I get the following error
---------------------------------------------------------------------------
OverflowError Traceback (most recent call last)
<ipython-input-57-a8e399fdfab2> in <module>()
----> 1 m.fit(df)
/anaconda2/envs/mde/lib/python3.7/site-packages/fbprophet/forecaster.py in fit(self, df, **kwargs)
1055 self.history_dates = pd.to_datetime(df['ds']).sort_values()
1056
-> 1057 history = self.setup_dataframe(history, initialize_scales=True)
1058 self.history = history
1059 self.set_auto_seasonalities()
/anaconda2/envs/mde/lib/python3.7/site-packages/fbprophet/forecaster.py in setup_dataframe(self, df, initialize_scales)
286 df['cap_scaled'] = (df['cap'] - df['floor']) / self.y_scale
287
--> 288 df['t'] = (df['ds'] - self.start) / self.t_scale
289 if 'y' in df:
290 df['y_scaled'] = (df['y'] - df['floor']) / self.y_scale
/anaconda2/envs/mde/lib/python3.7/site-packages/pandas/core/ops/__init__.py in wrapper(left, right)
990 # test_dt64_series_add_intlike, which the index dispatching handles
991 # specifically.
--> 992 result = dispatch_to_index_op(op, left, right, pd.DatetimeIndex)
993 return construct_result(
994 left, result, index=left.index, name=res_name, dtype=result.dtype
/anaconda2/envs/mde/lib/python3.7/site-packages/pandas/core/ops/__init__.py in dispatch_to_index_op(op, left, right, index_class)
628 left_idx = left_idx._shallow_copy(freq=None)
629 try:
--> 630 result = op(left_idx, right)
631 except NullFrequencyError:
632 # DatetimeIndex and TimedeltaIndex with freq == None raise ValueError
/anaconda2/envs/mde/lib/python3.7/site-packages/pandas/core/indexes/datetimelike.py in __sub__(self, other)
521 def __sub__(self, other):
522 # dispatch to ExtensionArray implementation
--> 523 result = self._data.__sub__(maybe_unwrap_index(other))
524 return wrap_arithmetic_op(self, other, result)
525
/anaconda2/envs/mde/lib/python3.7/site-packages/pandas/core/arrays/datetimelike.py in __sub__(self, other)
1278 result = self._add_offset(-other)
1279 elif isinstance(other, (datetime, np.datetime64)):
-> 1280 result = self._sub_datetimelike_scalar(other)
1281 elif lib.is_integer(other):
1282 # This check must come after the check for np.timedelta64
/anaconda2/envs/mde/lib/python3.7/site-packages/pandas/core/arrays/datetimes.py in _sub_datetimelike_scalar(self, other)
856
857 i8 = self.asi8
--> 858 result = checked_add_with_arr(i8, -other.value, arr_mask=self._isnan)
859 result = self._maybe_mask_results(result)
860 return result.view("timedelta64[ns]")
/anaconda2/envs/mde/lib/python3.7/site-packages/pandas/core/algorithms.py in checked_add_with_arr(arr, b, arr_mask, b_mask)
1006
1007 if to_raise:
-> 1008 raise OverflowError("Overflow in int64 addition")
1009 return arr + b
1010
OverflowError: Overflow in int64 addition```
I understand that there's an issue with 'ds', but I am not sure whether there is something wring with the column's format or an open issue.
Does anyone have any idea how to fix this? I have checked some issues in github, but they haven't been of much help in this case.
Thanks
This is not an answer to fix the issue, but how to avoid the error.
I got the same error, and manage to get rid of the error when I reduce the number of data that is coming in OR when I reduce the horizon span of the forecast.
For example, I limit my training data to only start since 1825 meanwhile I have data from the year of 1700s. I also tried to limit my forecast days from 10 years forecast to only 1 year. Both managed to get rid of the error.
My guess this problem has something to do with how the ARIMA is implemented inside the Prophet itself which in some cases the number is just to huge to be managed by int64 and become overflow.

matplotlib - ImportError: No module named _tkinter

I have a simple notebook with the following code:
%matplotlib inline
However, when running it I get the following error:
ImportError: No module named _tkinter
I have another notebook in the same project, and that one is able to run the statement without issue.
The data science experience is a managed service so you don't have root access to install _tkinter.
Full stacktrace:
ImportErrorTraceback (most recent call last)
<ipython-input-43-5f9c00ae8c2d> in <module>()
----> 1 get_ipython().magic(u'matplotlib inline')
2
3 import matplotlib.pyplot as plt
4 #import numpy as np
5
/usr/local/src/bluemix_jupyter_bundle.v20/notebook/lib/python2.7/site-packages/IPython/core/interactiveshell.pyc in magic(self, arg_s)
2161 magic_name, _, magic_arg_s = arg_s.partition(' ')
2162 magic_name = magic_name.lstrip(prefilter.ESC_MAGIC)
-> 2163 return self.run_line_magic(magic_name, magic_arg_s)
2164
2165 #-------------------------------------------------------------------------
/usr/local/src/bluemix_jupyter_bundle.v20/notebook/lib/python2.7/site-packages/IPython/core/interactiveshell.pyc in run_line_magic(self, magic_name, line)
2082 kwargs['local_ns'] = sys._getframe(stack_depth).f_locals
2083 with self.builtin_trap:
-> 2084 result = fn(*args,**kwargs)
2085 return result
2086
<decorator-gen-106> in matplotlib(self, line)
/usr/local/src/bluemix_jupyter_bundle.v20/notebook/lib/python2.7/site-packages/IPython/core/magic.pyc in <lambda>(f, *a, **k)
191 # but it's overkill for just that one bit of state.
192 def magic_deco(arg):
--> 193 call = lambda f, *a, **k: f(*a, **k)
194
195 if callable(arg):
/usr/local/src/bluemix_jupyter_bundle.v20/notebook/lib/python2.7/site-packages/IPython/core/magics/pylab.pyc in matplotlib(self, line)
98 print("Available matplotlib backends: %s" % backends_list)
99 else:
--> 100 gui, backend = self.shell.enable_matplotlib(args.gui)
101 self._show_matplotlib_backend(args.gui, backend)
102
/usr/local/src/bluemix_jupyter_bundle.v20/notebook/lib/python2.7/site-packages/IPython/core/interactiveshell.pyc in enable_matplotlib(self, gui)
2949 gui, backend = pt.find_gui_and_backend(self.pylab_gui_select)
2950
-> 2951 pt.activate_matplotlib(backend)
2952 pt.configure_inline_support(self, backend)
2953
/usr/local/src/bluemix_jupyter_bundle.v20/notebook/lib/python2.7/site-packages/IPython/core/pylabtools.pyc in activate_matplotlib(backend)
293 matplotlib.rcParams['backend'] = backend
294
--> 295 import matplotlib.pyplot
296 matplotlib.pyplot.switch_backend(backend)
297
/gpfs/fs01/user/sdd1-7e9fd7607be53e-39ca506ba762/.local/lib/python2.7/site-packages/matplotlib/pyplot.py in <module>()
112
113 from matplotlib.backends import pylab_setup
--> 114 _backend_mod, new_figure_manager, draw_if_interactive, _show = pylab_setup()
115
116 _IP_REGISTERED = None
/gpfs/fs01/user/sdd1-7e9fd7607be53e-39ca506ba762/.local/lib/python2.7/site-packages/matplotlib/backends/__init__.pyc in pylab_setup()
30 # imports. 0 means only perform absolute imports.
31 backend_mod = __import__(backend_name,
---> 32 globals(),locals(),[backend_name],0)
33
34 # Things we pull in from all backends
/gpfs/fs01/user/sdd1-7e9fd7607be53e-39ca506ba762/.local/lib/python2.7/site-packages/matplotlib/backends/backend_tkagg.py in <module>()
4
5 from matplotlib.externals import six
----> 6 from matplotlib.externals.six.moves import tkinter as Tk
7 from matplotlib.externals.six.moves import tkinter_filedialog as FileDialog
8
/gpfs/fs01/user/sdd1-7e9fd7607be53e-39ca506ba762/.local/lib/python2.7/site-packages/matplotlib/externals/six.pyc in load_module(self, fullname)
197 mod = self.__get_module(fullname)
198 if isinstance(mod, MovedModule):
--> 199 mod = mod._resolve()
200 else:
201 mod.__loader__ = self
/gpfs/fs01/user/sdd1-7e9fd7607be53e-39ca506ba762/.local/lib/python2.7/site-packages/matplotlib/externals/six.pyc in _resolve(self)
111
112 def _resolve(self):
--> 113 return _import_module(self.mod)
114
115 def __getattr__(self, attr):
/gpfs/fs01/user/sdd1-7e9fd7607be53e-39ca506ba762/.local/lib/python2.7/site-packages/matplotlib/externals/six.pyc in _import_module(name)
78 def _import_module(name):
79 """Import module, returning the module after the last dot."""
---> 80 __import__(name)
81 return sys.modules[name]
82
/usr/local/src/bluemix_jupyter_bundle.v20/notebook/lib/python2.7/lib-tk/Tkinter.py in <module>()
37 # Attempt to configure Tcl/Tk without requiring PATH
38 import FixTk
---> 39 import _tkinter # If this fails your Python may not be configured for Tk
40 tkinter = _tkinter # b/w compat for export
41 TclError = _tkinter.TclError
ImportError: No module named _tkinter
So the fix was quite simple - I just had to restart the kernel using the kernel menu item in the notebook.
I had experienced the same problem when running ipython locally on my laptop and the solution was to install tkinter, so I wasn't expecting the answer to be as simple as restarting the kernel.
Another time I received this error message, restarting the kernel did not work. I had to:
change the spark backend
download the notebook to file
delete the notebook in DSX
create a new notebook from the downloaded notebook

Error in filtering groupby results in pandas

I am trying to filter groupby results in pandas using the example provided at:
http://pandas.pydata.org/pandas-docs/dev/groupby.html#filtration
but getting the following error (pandas 0.12):
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-12-d0014484ff78> in <module>()
1 grouped = my_df.groupby('userID')
----> 2 grouped.filter(lambda x: len(x) >= 5)
/Users/zz/anaconda/lib/python2.7/site-packages/pandas/core/groupby.pyc in filter(self, func, dropna, *args, **kwargs)
2092 res = path(group)
2093
-> 2094 if res:
2095 indexers.append(self.obj.index.get_indexer(group.index))
2096
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
What does it mean and how can it be resolved?
EDIT:
code to replicate the problem in pandas 0.12 stable
dff = pd.DataFrame({'A': list('222'), 'B': list('123'), 'C': list('123') })
dff.groupby('A').filter(lambda x: len(x) > 2)
This was a quasi-bug in 0.12 and will be fixed in 0.13, the res is now protected by a type check:
if isinstance(res,(bool,np.bool_)):
if res:
add_indices()
I'm not quite sure how you got this error however, the docs are actually compiled and run with actual pandas. You should ensure you're reading the docs for the correct version (in this case you were linking to dev rather than stable - although the API is largely unchanged).
The standard workaround is to do this using transform, which in this case would be something like:
In [11]: dff[g.B.transform(lambda x: len(x) > 2)]
Out[11]:
A B C
0 2 1 1
1 2 2 2
2 2 3 3