ValueError: Exception encountered when calling layer "tf.__operators__.getitem_20" (type SlicingOpLambda) - dataframe

Followed tensorflow's tutorial and tried to recreate the code myself with multi-label input features and encountered this error. I've recreated the sample code as follow.
DataFrame Creation:
sample_df = pd.DataFrame({"feature_1": [['aa', 'bb','cc'], ['cc', 'dd', 'ee'], ['cc', 'aa', 'ee']], "feature_2": [['aa', 'bb','cc'], ['cc', 'dd', 'ee'], ['cc', 'aa', 'ee']]})
Output:
feature_1 feature_2
0 [aa, bb, cc] [aa, bb, cc]
1 [cc, dd, ee] [cc, dd, ee]
2 [cc, aa, ee] [cc, aa, ee]
Input Layer:
inputs = {}
inputs['feature_1'] = tf.keras.Input(shape=(), name='feature_1', dtype=tf.string)
inputs['feature_2'] = tf.keras.Input(shape=(), name='feature_2', dtype=tf.string)
Output:
{'feature_1': <KerasTensor: shape=(None,) dtype=string (created by layer 'feature_1')>,
'feature_2': <KerasTensor: shape=(None,) dtype=string (created by layer 'feature_2')>}
Preprocessing Layer:
preprocessed = []
for name, column in sample_df.items():
vocab = ['aa', 'bb', 'cc', 'dd', 'ee']
lookup = tf.keras.layers.StringLookup(vocabulary=vocab, output_mode='multi_hot')
print(f'name: {name}')
print(f'vocab: {vocab}\n')
x = inputs[name][:, tf.newaxis]
x = lookup(x)
preprocessed.append(x)
Output:
name: feature_1
vocab: ['aa', 'bb', 'cc', 'dd', 'ee']
name: feature_2
vocab: ['aa', 'bb', 'cc', 'dd', 'ee']
[<KerasTensor: shape=(None, 6) dtype=float32 (created by layer 'string_lookup_27')>,
<KerasTensor: shape=(None, 6) dtype=float32 (created by layer 'string_lookup_28')>]
Model Creation:
preprocessed_result = tf.concat(preprocessed, axis=-1)
preprocessor = tf.keras.Model(inputs, preprocessed_result)
tf.keras.utils.plot_model(preprocessor, rankdir="LR", show_shapes=True)
Output:
<KerasTensor: shape=(None, 12) dtype=float32 (created by layer 'tf.concat_4')>
Error:
preprocessor(dict(sample_df.iloc[:1]))
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
.../sample.ipynb Cell 63' in <cell line: 1>()
----> 1 preprocessor(dict(sample_df.iloc[:1]))
File ~/.local/lib/python3.10/site-packages/keras/utils/traceback_utils.py:67, in filter_traceback.<locals>.error_handler(*args, **kwargs)
65 except Exception as e: # pylint: disable=broad-except
66 filtered_tb = _process_traceback_frames(e.__traceback__)
---> 67 raise e.with_traceback(filtered_tb) from None
68 finally:
69 del filtered_tb
File ~/.local/lib/python3.10/site-packages/tensorflow/python/framework/constant_op.py:102, in convert_to_eager_tensor(value, ctx, dtype)
100 dtype = dtypes.as_dtype(dtype).as_datatype_enum
101 ctx.ensure_initialized()
--> 102 return ops.EagerTensor(value, ctx.device_name, dtype)
ValueError: Exception encountered when calling layer "tf.__operators__.getitem_20" (type SlicingOpLambda).
Failed to convert a NumPy array to a Tensor (Unsupported object type list).
Call arguments received:
• tensor=0 [aa, bb, cc]
Name: feature_2, dtype: object
• slice_spec=({'start': 'None', 'stop': 'None', 'step': 'None'}, 'None')
• var=None
Guide Link
Any help with the error or further understand the error will be greatly appreciated.
Thank you very much in advance.

I have created a workaround for anyone who is interested / facing similar issue. This is by means just a workaround and not a solution.
Workaround: Since my multi-hot encodings are binary in nature, I just broke them down each into a feature on its own.
Sample code:
sample_df = pd.DataFrame({"feature_1": [['aa', 'bb','cc'], ['cc', 'dd', 'ee'], ['cc', 'aa', 'ee']]})
feature_1_labels = set()
for i in range(sample_df.shape[0]):
feature_1_labels.update(sample_df.iloc[i]['feature_1'])
for label in sorted(feature_1_labels):
sample_df[label] = 0
for i in range(sample_df.shape[0]):
for label in sample_df.iloc[i]['feature_1']:
sample_df.iloc[i, sample_df.columns.get_loc(label)] = 1
sample_df
Output:
feature_1 aa bb cc dd ee
0 [aa, bb, cc] 1 1 1 0 0
1 [cc, dd, ee] 0 0 1 1 1
2 [cc, aa, ee] 1 0 1 0 1
Note: By doing so will significantly increase the number of input features. Something to keep in mind.
Feel free to let me know a better workaround / if I'm wrong in any way :)

Related

Numpy: using np.pad() for an RGB image causing "operands could not be broadcast together with shapes (4,4,3) (4,4,5)" error

I have a function color_image_padding that takes an RGB image and adds one layer of zeros padding to the borders. The image has dimensions (Width, Height, 3), with 3 representing the 3 color channels.
My code is:
import numpy as np
def color_image_padding(image: np.ndarray) -> np.ndarray:
return np.pad(image, pad_width=1)
I'm seeing this error:
"operands could not be broadcast together with shapes (4,4,3) (4,4,5)"
It's probably the color channels that are causing this error. Doesn't np.pad split the image into 3 matrices and add the zero padding accordingly?
Thanks in advance for your assistance!
EDIT
See comments below... It turns out that the generalized function image_padding() was throwing an error message because some greyscale images (i.e. 2D Numpy matrices) were passed in. Here's a minimal example:
bar = np.ones((1, 3))
bar.ndim
2
def image_padding(image: np.ndarray, amt: int) -> np.ndarray:
return np.pad(image, pad_width=((amt, amt), (amt, amt), (0, 0)))
image_padding(bar, 2)
Full Traceback:
ValueError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_8116/4065018867.py in <module>
----> 1 img(bar, 3)
~\AppData\Local\Temp/ipykernel_8116/1455868751.py in img(image, amt)
1 def img(image, amt):
----> 2 return np.pad(image, pad_width=((amt, amt), (amt, amt), (0, 0)))
<__array_function__ internals> in pad(*args, **kwargs)
~\anaconda3\lib\site-packages\numpy\lib\arraypad.py in pad(array, pad_width, mode, **kwargs)
741
742 # Broadcast to shape (array.ndim, 2)
--> 743 pad_width = _as_pairs(pad_width, array.ndim, as_index=True)
744
745 if callable(mode):
~\anaconda3\lib\site-packages\numpy\lib\arraypad.py in _as_pairs(x, ndim, as_index)
516 # Converting the array with `tolist` seems to improve performance
517 # when iterating and indexing the result (see usage in `pad`)
--> 518 return np.broadcast_to(x, (ndim, 2)).tolist()
519
520
<__array_function__ internals> in broadcast_to(*args, **kwargs)
~\anaconda3\lib\site-packages\numpy\lib\stride_tricks.py in broadcast_to(array, shape, subok)
409 [1, 2, 3]])
410 """
--> 411 return _broadcast_to(array, shape, subok=subok, readonly=True)
412
413
~\anaconda3\lib\site-packages\numpy\lib\stride_tricks.py in _broadcast_to(array, shape, subok, readonly)
346 'negative')
347 extras = []
--> 348 it = np.nditer(
349 (array,), flags=['multi_index', 'refs_ok', 'zerosize_ok'] + extras,
350 op_flags=['readonly'], itershape=shape, order='C')
ValueError: operands could not be broadcast together with remapped shapes [original->remapped]: (3,2) and requested shape (2,2)
Testing whether the image is greyscale or color resolves the issue:
def image_padding(image: np.ndarray, amt: int) -> np.ndarray:
if image.ndim == 2:
return np.pad(image, pad_width=(amt, amt))
elif img.ndim == 3:
return np.pad(image, pad_width=((amt, amt), (amt, amt), (0, 0)))
So this reproduces your error - using the three term pad_width on a 2d array:
ok with 3d:
In [194]: x = np.ones((5,5,3),int)
In [196]: amt_padding=1;np.pad(x, pad_width=((amt_padding, amt_padding), (amt_padding, amt_padding), (0, 0))).shape
Out[196]: (7, 7, 3)
but if the array is 2d:
In [197]: amt_padding=1;np.pad(x[:,:,0], pad_width=((amt_padding, amt_padding), (amt_padding, amt_padding), (0, 0)))
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [197], in <cell line: 1>()
----> 1 amt_padding=1;np.pad(x[:,:,0], pad_width=((amt_padding, amt_padding), (amt_padding, amt_padding), (0, 0)))
File <__array_function__ internals>:5, in pad(*args, **kwargs)
File ~\anaconda3\lib\site-packages\numpy\lib\arraypad.py:743, in pad(array, pad_width, mode, **kwargs)
740 raise TypeError('`pad_width` must be of integral type.')
742 # Broadcast to shape (array.ndim, 2)
--> 743 pad_width = _as_pairs(pad_width, array.ndim, as_index=True)
745 if callable(mode):
746 # Old behavior: Use user-supplied function with np.apply_along_axis
747 function = mode
File ~\anaconda3\lib\site-packages\numpy\lib\arraypad.py:518, in _as_pairs(x, ndim, as_index)
514 raise ValueError("index can't contain negative values")
516 # Converting the array with `tolist` seems to improve performance
517 # when iterating and indexing the result (see usage in `pad`)
--> 518 return np.broadcast_to(x, (ndim, 2)).tolist()
File <__array_function__ internals>:5, in broadcast_to(*args, **kwargs)
File ~\anaconda3\lib\site-packages\numpy\lib\stride_tricks.py:411, in broadcast_to(array, shape, subok)
366 #array_function_dispatch(_broadcast_to_dispatcher, module='numpy')
367 def broadcast_to(array, shape, subok=False):
368 """Broadcast an array to a new shape.
369
370 Parameters
(...)
409 [1, 2, 3]])
410 """
--> 411 return _broadcast_to(array, shape, subok=subok, readonly=True)
File ~\anaconda3\lib\site-packages\numpy\lib\stride_tricks.py:348, in _broadcast_to(array, shape, subok, readonly)
345 raise ValueError('all elements of broadcast shape must be non-'
346 'negative')
347 extras = []
--> 348 it = np.nditer(
349 (array,), flags=['multi_index', 'refs_ok', 'zerosize_ok'] + extras,
350 op_flags=['readonly'], itershape=shape, order='C')
351 with it:
352 # never really has writebackifcopy semantics
353 broadcast = it.itviews[0]
ValueError: operands could not be broadcast together with remapped shapes [original->remapped]: (3,2) and requested shape (2,2)
It's passing the task to np.nditer (via broadcast_to), which is raising the error. That would account for why I've never seen it before. I've explored nditer some, but it's not something I regularly use or recommend to others.
The _as_pairs expands widths like
In [206]: np.lib.arraypad._as_pairs(1,3, as_index=True)
Out[206]: ((1, 1), (1, 1), (1, 1))
In [207]: np.lib.arraypad._as_pairs(((1,),(2,),(3,)),3, as_index=True)
Out[207]: [[1, 1], [2, 2], [3, 3]]

KeyError: "None of [Index([('g', 'e', 'o', 'm', 'e', 't', 'r', 'y', '_', 'x')], dtype='object')] are in the [index]"

My code
rpg15_19.loc[rpg15_19.geometry_x != None, 'geom'] = rpg15_19['geometry_x']
rpg15_19.loc[rpg15_19.geometry_y != None, 'geom'] = rpg15_19['geometry_y']
rpg15_19.loc[rpg15_19.geometry != None, 'geom'] = rpg15_19['geometry']`
Message error
KeyError Traceback (most recent call last)
Input In [5], in <cell line: 2>()
1 # on créé une seule colonne de géométrie
----> 2 rpg15_19.loc[rpg15_19.geometry_x != None, 'geom'] = rpg15_19['geometry_x']
3 rpg15_19.loc[rpg15_19.geometry_y != None, 'geom'] = rpg15_19['geometry_y']
4 rpg15_19.loc[rpg15_19.geometry != None, 'geom'] = rpg15_19['geometry']
File ~\Anaconda3\envs\geo_env\lib\site-packages\pandas\core\indexing.py:712, in _LocationIndexer.__setitem__(self, key, value)
710 else:
711 key = com.apply_if_callable(key, self.obj)
--> 712 indexer = self._get_setitem_indexer(key)
713 self._has_valid_setitem_indexer(key)
715 iloc = self if self.name == "iloc" else self.obj.iloc
File ~\Anaconda3\envs\geo_env\lib\site-packages\pandas\core\indexing.py:661, in _LocationIndexer._get_setitem_indexer(self, key)
659 if isinstance(key, tuple):
660 with suppress(IndexingError):
--> 661 return self._convert_tuple(key)
663 if isinstance(key, range):
664 return list(key)
File ~\Anaconda3\envs\geo_env\lib\site-packages\pandas\core\indexing.py:799, in _LocationIndexer._convert_tuple(self, key)
797 self._validate_key_length(key)
798 for i, k in enumerate(key):
--> 799 idx = self._convert_to_indexer(k, axis=i)
800 keyidx.append(idx)
802 return tuple(keyidx)
File ~\Anaconda3\envs\geo_env\lib\site-packages\pandas\core\indexing.py:1291, in _LocIndexer._convert_to_indexer(self, key, axis)
1289 return inds
1290 else:
-> 1291 return self._get_listlike_indexer(key, axis)[1]
1292 else:
1293 try:
File ~\Anaconda3\envs\geo_env\lib\site-packages\pandas\core\indexing.py:1327, in _LocIndexer._get_listlike_indexer(self, key, axis)
1324 ax = self.obj._get_axis(axis)
1325 axis_name = self.obj._get_axis_name(axis)
-> 1327 keyarr, indexer = ax._get_indexer_strict(key, axis_name)
1329 return keyarr, indexer
File ~\Anaconda3\envs\geo_env\lib\site-packages\pandas\core\indexes\base.py:5782, in Index._get_indexer_strict(self, key, axis_name)
5779 else:
5780 keyarr, indexer, new_indexer = self._reindex_non_unique(keyarr)
-> 5782 self._raise_if_missing(keyarr, indexer, axis_name)
5784 keyarr = self.take(indexer)
5785 if isinstance(key, Index):
5786 # GH 42790 - Preserve name from an Index
File ~\Anaconda3\envs\geo_env\lib\site-packages\pandas\core\indexes\base.py:5842, in Index._raise_if_missing(self, key, indexer, axis_name)
5840 if use_interval_msg:
5841 key = list(key)
-> 5842 raise KeyError(f"None of [{key}] are in the [{axis_name}]")
5844 not_found = list(ensure_index(key)[missing_mask.nonzero()[0]].unique())
5845 raise KeyError(f"{not_found} not in index")
KeyError: "None of [Index([('g', 'e', 'o', 'm', 'e', 't', 'r', 'y', '_', 'x'), ('g', 'e', 'o', 'm', 'e', 't', 'r', 'y', '_', 'x')], dtype='object')] are in the [index]"
I want to create a single column of geometry but I get an error message like this. Can you help me please?

Plotly-Dash live chart : Callback failed

I have been researching the Plotly-Dash library not too long ago. There was a callback problem on the second graph .. I can't figure out what is the reason. I have been understanding the components for a long time but have not found the reason. However, in the first graph, the callbacks do work.
here is my code:
import dash
import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Input, Output
import plotly.graph_objs as go
import sqlite3
import pytz
import pandas as pd
import numpy as np
from datetime import datetime
external_stylesheets = ['https://codepen.io/chriddyp/pen/bWLwgP.css']
app = dash.Dash(__name__)
def get_value(n_intervals):
timezone = pytz.timezone("Etc/UTC")
utc_from = datetime(2021, 3, 23, tzinfo=timezone)
base = sqlite3.connect('base_eurousd.db')
cur = base.cursor()
read_db = cur.execute('SELECT * FROM data_eurusd').fetchall()
df = pd.DataFrame(read_db)
df[0] = pd.to_datetime(df[0], unit='ms')
df[3] = np.where(df[1].diff().lt(0) | df[2].diff().lt(0), df[3] * -1, df[3])
return df
def get_value_analyze(n_intervals):
timezone = pytz.timezone("Etc/UTC")
utc_from = datetime(2021, 3, 23, tzinfo=timezone)
base = sqlite3.connect('base_eurousd.db')
cur = base.cursor()
read_db = cur.execute('SELECT * FROM data_eurusd').fetchall()
res = pd.DataFrame(read_db)
res[0] = pd.to_datetime(res[0], unit='ms')
res[3] = np.where(res[1].diff().lt(0) | res[2].diff().lt(0), res[3] * -1, res[3])
funcs = {
"bid_open": (1, 'first'),
"bid_close": (1, 'last'),
"tiks": (0, 'size'),
"ask_open": (2, 'first'),
"ask_close": (2, 'last'),
"bid_min": (1, 'min'),
"bid_max": (1, 'max'),
"ask_min": (2, 'min'),
"ask_max": (2, 'max'),
"qvant": (3, 'quantile'),
"sred": (3, 'mean'),
"skew": (3, 'skew'),
">0": (3, lambda x: x.lt(0).sum()),
"=0": (3, lambda x: x.eq(0).sum()),
"<0": (3, lambda x: x.gt(0).sum())
}
res1 = res.groupby(pd.Grouper(key=0, freq="1T")).agg(**funcs)
return res1
def serve_layout():
return html.Div(
children=[
html.H4(children='111'),
html.Div(id='my-id', children='''EURUSD'''),
dcc.Graph(id='example-graph', animate=True, responsive=True),
dcc.Interval(
id='interval-component',
interval=3 * 1000,
n_intervals=0,
),
html.Div(id='my-id2', children='''NO UPDATE'''),
dcc.Graph(id='example-graph-2', animate=True, responsive=True),
dcc.Interval(
id='interval-component2',
interval=3 * 1000,
n_intervals=0,
),
],
)
app.layout = serve_layout
#app.callback(
Output('example-graph', 'figure'),
[Input('interval-component', 'n_intervals')])
def update_graph(n_intervals):
df = get_value(n_intervals)
return \
{
'data': [
{'x': df[0], 'y': df[1], 'type': 'line', 'name': 'BID'},
],
'layout': go.Layout(xaxis=dict(range=[min(df[0]), max(df[0])]),
yaxis=dict(range=[min(df[1]), max(df[1])]), )
}
#app.callback(
Output('example-graph-2', 'figure'),
[Input('interval-component2', 'n_intervals')])
def update_graph_2(n_intervals):
res1 = get_value_analyze(n_intervals)
return \
{
'data': [
{'x': res1[0], 'y': res1[3], 'type': 'line', 'name': 'ТИК'},
{'x': res1[0], 'y': res1[10], 'type': 'line', 'name': 'КВАНТИЛЬ'},
{'x': res1[0], 'y': res1[11], 'type': 'line', 'name': 'СПРЕД'},
{'x': res1[0], 'y': res1[12], 'type': 'line', 'name': 'SKEW'},
{'x': res1[0], 'y': res1[13], 'type': 'line', 'name': 'МЕНЬШЕ НУЛЯ'},
{'x': res1[0], 'y': res1[13], 'type': 'line', 'name': 'МЕНЬШЕ НУЛЯ'},
{'x': res1[0], 'y': res1[14], 'type': 'line', 'name': 'РАВНО НУЛЮ'},
{'x': res1[0], 'y': res1[15], 'type': 'line', 'name': 'БОЛЬШЕ НУЛЯ'},
],
}
if __name__ == '__main__':
app.run_server(debug=True)
Please tell me the solution to this problem?
This Error:
[2021-06-28 23:16:10,613] ERROR in app: Exception on /_dash-update-component [POST]
Traceback (most recent call last):
File "C:\Users\neket\AppData\Roaming\Python\Python39\site-packages\pandas\core\indexes\base.py", line 3080, in get_loc
return self._engine.get_loc(casted_key)
File "pandas\_libs\index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 4554, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 4562, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 0
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\neket\AppData\Local\Programs\Python\Python39\lib\site-packages\flask\app.py", line 2447, in wsgi_app
response = self.full_dispatch_request()
File "C:\Users\neket\AppData\Local\Programs\Python\Python39\lib\site-packages\flask\app.py", line 1952, in full_dispatch_request
rv = self.handle_user_exception(e)
File "C:\Users\neket\AppData\Local\Programs\Python\Python39\lib\site-packages\flask\app.py", line 1821, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "C:\Users\neket\AppData\Local\Programs\Python\Python39\lib\site-packages\flask\_compat.py", line 39, in reraise
raise value
File "C:\Users\neket\AppData\Local\Programs\Python\Python39\lib\site-packages\flask\app.py", line 1950, in full_dispatch_request
rv = self.dispatch_request()
File "C:\Users\neket\AppData\Local\Programs\Python\Python39\lib\site-packages\flask\app.py", line 1936, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "C:\Users\neket\AppData\Local\Programs\Python\Python39\lib\site-packages\dash\dash.py", line 1079, in dispatch
response.set_data(func(*args, outputs_list=outputs_list))
File "C:\Users\neket\AppData\Local\Programs\Python\Python39\lib\site-packages\dash\dash.py", line 1010, in add_context
output_value = func(*args, **kwargs) # %% callback invoked %%
File "C:\Users\neket\OneDrive\Рабочий стол\MT5 CODE\TEST_PANDAS_DASH.py", line 120, in update_graph_2
{'x': res1[0], 'y': res1[3], 'type': 'line', 'name': 'ТИК'},
File "C:\Users\neket\AppData\Roaming\Python\Python39\site-packages\pandas\core\frame.py", line 3024, in __getitem__
indexer = self.columns.get_loc(key)
File "C:\Users\neket\AppData\Roaming\Python\Python39\site-packages\pandas\core\indexes\base.py", line 3082, in get_loc
raise KeyError(key) from err
KeyError: 0
Output from df:
0 1 2 3
0 2021-06-30 18:33:24.471 1.18550 1.18552 2
1 2021-06-30 18:33:26.073 1.18552 1.18555 6
2 2021-06-30 18:33:26.173 1.18553 1.18555 2
3 2021-06-30 18:33:26.273 1.18553 1.18554 -4
4 2021-06-30 18:33:26.381 1.18552 1.18554 -2
... ... ... ... ..
36331 2021-07-01 07:50:54.126 1.18489 1.18491 2
36332 2021-07-01 07:51:01.297 1.18489 1.18491 2
36333 2021-07-01 07:51:13.108 1.18489 1.18493 4
36334 2021-07-01 07:51:16.078 1.18490 1.18493 2
36335 2021-07-01 07:51:31.511 1.18489 1.18493 -2
Output from res1:
bid_open bid_close tiks ask_open ... skew >0 =0 <0
0 ...
2021-06-28 08:30:00 1.19259 1.19259 5 1.19263 ... -0.567163 2 0 3
2021-06-28 08:31:00 1.19259 1.19269 57 1.19261 ... -0.182745 25 0 32
2021-06-28 08:32:00 1.19269 1.19278 90 1.19272 ... -0.008905 44 0 46
2021-06-28 08:33:00 1.19278 1.19277 62 1.19281 ... 0.010950 31 0 31
2021-06-28 08:34:00 1.19277 1.19282 61 1.19281 ... 0.033526 31 0 30
... ... ... ... ... ... ... .. .. ..
2021-06-28 17:14:00 1.19254 1.19248 157 1.19255 ... -0.050396 76 0 81
2021-06-28 17:15:00 1.19248 1.19257 81 1.19252 ... -0.161537 36 0 45
2021-06-28 17:16:00 1.19258 1.19247 107 1.19260 ... 0.046116 54 0 53
2021-06-28 17:17:00 1.19247 1.19275 77 1.19250 ... -0.415599 29 0 48
2021-06-28 17:18:00 1.19275 1.19304 68 1.19278 ... -0.398739 26 0 42
[529 rows x 15 columns]
Link to base : https://cloud.mail.ru/public/gtDi/BCgw1Jk8r
Let's compare the dataframe used in the update_graph callback (df) to the dataframe used in the update_graph_2 callback (res1):
# inside update_graph callback
df = get_value(n_intervals)
>>> df.columns
RangeIndex(start=0, stop=4, step=1)
# inside update_graph_2 callback
res1 = get_value_analyze(n_intervals)
>>> res1.columns
Index(
[
"bid_open",
"bid_close",
"tiks",
"ask_open",
"ask_close",
"bid_min",
"bid_max",
"ask_min",
"ask_max",
"qvant",
"sred",
"skew",
">0",
"=0",
"<0",
],
dtype="object",
)
This explains why the way you're passing data to graph in the first callback does work, since 0 and 1 exist in the index.
So in the second callback instead of passing in res[0], res[1], res[2], etc... pass in a column that exists in the index like res["bid_open"].

DBSCAN plot - The color values passed in plt.plot() is throwing ValueError

I am using DBSCAN to perform clustering on a dataset. I think it is because of the color argument passed to the markerfacecolor in plt.plot() which is not a single value. Please let me know if am wrong here. My features are latitude, longitude,speed_mph,speedlimit_mph,vehicle_id,driver_id.
Here is my clustering code
dbsc = DBSCAN(eps = .5, min_samples = 5).fit(df_cont)
labels = dbsc.labels_
print(labels)
num_clusters = len(set(labels))
clusters = pd.Series([df_cont[labels == n] for n in range(num_clusters)])
print('Number of clusters: {}'.format(num_clusters))
# No of clusters : 5687
core_samples = np.zeros_like(labels, dtype = bool)
core_samples[dbsc.core_sample_indices_] = True
unique_labels = np.unique(labels)
colors = plt.cm.Spectral(np.linspace(0,1, len(unique_labels)))
for (label, color) in zip(unique_labels, colors):
class_member_mask = (labels == label)
xy = df_cont[class_member_mask & core_samples]
print("color:",color)
# color: [ 0.61960784 0.00392157 0.25882353 1. ]
plt.plot(xy.values[:,0],xy.values[:,1], marker='o', markerfacecolor = color, markersize = 10)
xy2 = df_cont[class_member_mask & ~core_samples]
plt.plot(xy2.values[:,0],xy2.values[:,1], 'o', markerfacecolor = color, markersize = 5)
plt.title("DBSCAN Driver - Speed MPH")
plt.xlabel("driver")
plt.ylabel("Speed")
plt.show()
Here is the error message thrown
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-105-0192647e6baf> in <module>()
3 xy = df_cont[class_member_mask & core_samples]
4 print("color:",color)
----> 5 plt.plot(xy.values[:,0],xy.values[:,1], marker='o', markerfacecolor = color, markersize = 10)
6
7 xy2 = df_cont[class_member_mask & ~core_samples]
/home/radiance/anaconda3/lib/python3.6/site-packages/matplotlib/pyplot.py in plot(*args, **kwargs)
3315 mplDeprecation)
3316 try:
-> 3317 ret = ax.plot(*args, **kwargs)
3318 finally:
3319 ax._hold = washold
/home/radiance/anaconda3/lib/python3.6/site-packages/matplotlib/__init__.py in inner(ax, *args, **kwargs)
1896 warnings.warn(msg % (label_namer, func.__name__),
1897 RuntimeWarning, stacklevel=2)
-> 1898 return func(ax, *args, **kwargs)
1899 pre_doc = inner.__doc__
1900 if pre_doc is None:
/home/radiance/anaconda3/lib/python3.6/site-packages/matplotlib/axes/_axes.py in plot(self, *args, **kwargs)
1404 kwargs = cbook.normalize_kwargs(kwargs, _alias_map)
1405
-> 1406 for line in self._get_lines(*args, **kwargs):
1407 self.add_line(line)
1408 lines.append(line)
/home/radiance/anaconda3/lib/python3.6/site-packages/matplotlib/axes/_base.py in _grab_next_args(self, *args, **kwargs)
405 return
406 if len(remaining) <= 3:
--> 407 for seg in self._plot_args(remaining, kwargs):
408 yield seg
409 return
/home/radiance/anaconda3/lib/python3.6/site-packages/matplotlib/axes/_base.py in _plot_args(self, tup, kwargs)
393 ncx, ncy = x.shape[1], y.shape[1]
394 for j in xrange(max(ncx, ncy)):
--> 395 seg = func(x[:, j % ncx], y[:, j % ncy], kw, kwargs)
396 ret.append(seg)
397 return ret
/home/radiance/anaconda3/lib/python3.6/site-packages/matplotlib/axes/_base.py in _makeline(self, x, y, kw, kwargs)
300 default_dict = self._getdefaults(None, kw)
301 self._setdefaults(default_dict, kw)
--> 302 seg = mlines.Line2D(x, y, **kw)
303 return seg
304
/home/radiance/anaconda3/lib/python3.6/site-packages/matplotlib/lines.py in __init__(self, xdata, ydata, linewidth, linestyle, color, marker, markersize, markeredgewidth, markeredgecolor, markerfacecolor, markerfacecoloralt, fillstyle, antialiased, dash_capstyle, solid_capstyle, dash_joinstyle, solid_joinstyle, pickradius, drawstyle, markevery, **kwargs)
418 self._markerfacecoloralt = None
419
--> 420 self.set_markerfacecolor(markerfacecolor)
421 self.set_markerfacecoloralt(markerfacecoloralt)
422 self.set_markeredgecolor(markeredgecolor)
/home/radiance/anaconda3/lib/python3.6/site-packages/matplotlib/lines.py in set_markerfacecolor(self, fc)
1204 if fc is None:
1205 fc = 'auto'
-> 1206 if self._markerfacecolor != fc:
1207 self.stale = True
1208 self._markerfacecolor = fc
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Also I tried to do clustering taking my lat, long with other features. DBSCAN throwed error that only two features are allowed. Should I ask this as a separate question?
dbsc = DBSCAN(eps = .5, min_samples = 5, algorithm='ball_tree', metric='haversine').fit(np.radians(df_cont))
The contents of df_cont are-
{'Day': [1, 1, 1, 1, 1],
'Month': [6, 6, 6, 6, 6],
'Year': [2015, 2015, 2015, 2015, 2015],
'driver_id': [5693, 5693, 916461, 1145487, 1145487],
'latitude': [34.640141, 34.64373, 34.551254, 35.613663, 35.614525],
'longitude': [-77.938721,
-77.9394,
-78.78463,
-78.470596,
-78.47466999999999],
'speed_mph': [64, 64, 1, 62, 61],
'speedlimit_mph': [70, 70, 55, 70, 70],
'vehicle_id': [1208979, 1208979, 1262441, 1280223, 1280223]}
I got the error fixed by using a scatter plot. plt.scatter(xy.values[:,0],xy.values[:,1],s=10,c=color,marke‌​r='o')

Scoring returning a numpy.core.memmap instead of a numpy.Number in grid search

We are able (only within the context of our application atm) to reproduce on Ubuntu 15.04 and OS X with scikit 0.17 the following problem when using GridSearchCV with a LogisticRegression on larger data sets.
...........................................................................
/Users/samuelhopkins/.virtualenvs/cpml/lib/python2.7/site-packages/sklearn/pipeline.py in fit(self=Pipeline(steps=[('cpencoder', <cpml.whitebox.Lin...s', refit=True, scoring=u'roc_auc', verbose=1))]), X= Unnamed: 0 member_id loan_a... 42.993346
[152536 rows x 45 columns], y=array([0, 1, 0, ..., 1, 1, 0]), **fit_params={})
160 y : iterable, default=None
161 Training targets. Must fulfill label requirements for all steps of
162 the pipeline.
163 """
164 Xt, fit_params = self._pre_transform(X, y, **fit_params)
--> 165 self.steps[-1][-1].fit(Xt, y, **fit_params)
self.steps.fit = undefined
Xt = array([[ 0.00000000e+00, 1.29659900e+06, 5....000000e+00, 0.00000000e+00, 4.29933458e+01]])
y = array([0, 1, 0, ..., 1, 1, 0])
fit_params = {}
166 return self
167
168 def fit_transform(self, X, y=None, **fit_params):
169 """Fit all the transforms one after the other and transform the
...........................................................................
/Users/samuelhopkins/.virtualenvs/cpml/lib/python2.7/site-packages/sklearn/grid_search.py in fit(self=GridSearchCV(cv=None, error_score='raise',
...jobs', refit=True, scoring=u'roc_auc', verbose=1), X=array([[ 0.00000000e+00, 1.29659900e+06, 5....000000e+00, 0.00000000e+00, 4.29933458e+01]]), y=array([0, 1, 0, ..., 1, 1, 0]))
799 y : array-like, shape = [n_samples] or [n_samples, n_output], optional
800 Target relative to X for classification or regression;
801 None for unsupervised learning.
802
803 """
--> 804 return self._fit(X, y, ParameterGrid(self.param_grid))
self._fit = <bound method GridSearchCV._fit of GridSearchCV(...obs', refit=True, scoring=u'roc_auc', verbose=1)>
X = array([[ 0.00000000e+00, 1.29659900e+06, 5....000000e+00, 0.00000000e+00, 4.29933458e+01]])
y = array([0, 1, 0, ..., 1, 1, 0])
self.param_grid = {'C': [1], 'class_weight': ['auto'], 'fit_intercept': [False], 'intercept_scaling': [1], 'penalty': ['l2']}
805
806
807 class RandomizedSearchCV(BaseSearchCV):
808 """Randomized search on hyper parameters.
...........................................................................
/Users/samuelhopkins/.virtualenvs/cpml/lib/python2.7/site-packages/sklearn/grid_search.py in _fit(self=GridSearchCV(cv=None, error_score='raise',
...jobs', refit=True, scoring=u'roc_auc', verbose=1), X=array([[ 0.00000000e+00, 1.29659900e+06, 5....000000e+00, 0.00000000e+00, 4.29933458e+01]]), y=array([0, 1, 0, ..., 1, 1, 0]), parameter_iterable=<sklearn.grid_search.ParameterGrid object>)
548 )(
549 delayed(_fit_and_score)(clone(base_estimator), X, y, self.scorer_,
550 train, test, self.verbose, parameters,
551 self.fit_params, return_parameters=True,
552 error_score=self.error_score)
--> 553 for parameters in parameter_iterable
parameters = undefined
parameter_iterable = <sklearn.grid_search.ParameterGrid object>
554 for train, test in cv)
555
556 # Out is a list of triplet: score, estimator, n_test_samples
557 n_fits = len(out)
...........................................................................
/Users/samuelhopkins/.virtualenvs/cpml/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py in __call__(self=Parallel(n_jobs=2), iterable=<generator object <genexpr>>)
807 if pre_dispatch == "all" or n_jobs == 1:
808 # The iterable was consumed all at once by the above for loop.
809 # No need to wait for async callbacks to trigger to
810 # consumption.
811 self._iterating = False
--> 812 self.retrieve()
self.retrieve = <bound method Parallel.retrieve of Parallel(n_jobs=2)>
813 # Make sure that we get a last message telling us we are done
814 elapsed_time = time.time() - self._start_time
815 self._print('Done %3i out of %3i | elapsed: %s finished',
816 (len(self._output), len(self._output),
---------------------------------------------------------------------------
Sub-process traceback:
---------------------------------------------------------------------------
ValueError Mon Jan 18 11:58:09 2016
PID: 71840 Python 2.7.10: /Users/samuelhopkins/.virtualenvs/cpml/bin/python
...........................................................................
/Users/samuelhopkins/.virtualenvs/cpml/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in __call__(self=<sklearn.externals.joblib.parallel.BatchedCalls object>)
67 def __init__(self, iterator_slice):
68 self.items = list(iterator_slice)
69 self._size = len(self.items)
70
71 def __call__(self):
---> 72 return [func(*args, **kwargs) for func, args, kwargs in self.items]
73
74 def __len__(self):
75 return self._size
76
...........................................................................
/Users/samuelhopkins/.virtualenvs/cpml/lib/python2.7/site-packages/sklearn/cross_validation.pyc in _fit_and_score(estimator=LogisticRegression(C=1, class_weight='auto', dua... tol=0.0001, verbose=0, warm_start=False), X=memmap([[ 0.00000000e+00, 1.29659900e+06, 5...000000e+00, 0.00000000e+00, 4.29933458e+01]]), y=memmap([0, 1, 0, ..., 1, 1, 0]), scorer=make_scorer(roc_auc_score, needs_threshold=True), train=array([ 49100, 49101, 49102, ..., 152533, 152534, 152535]), test=array([ 0, 1, 2, ..., 57517, 57522, 57532]), verbose=1, parameters={'C': 1, 'class_weight': 'auto', 'fit_intercept': False, 'intercept_scaling': 1, 'penalty': 'l2'}, fit_params={}, return_train_score=False, return_parameters=True, error_score='raise')
1545 " numeric value. (Hint: if using 'raise', please"
1546 " make sure that it has been spelled correctly.)"
1547 )
1548
1549 else:
-> 1550 test_score = _score(estimator, X_test, y_test, scorer)
1551 if return_train_score:
1552 train_score = _score(estimator, X_train, y_train, scorer)
1553
1554 scoring_time = time.time() - start_time
...........................................................................
/Users/samuelhopkins/.virtualenvs/cpml/lib/python2.7/site-packages/sklearn/cross_validation.pyc in _score(estimator=LogisticRegression(C=1, class_weight='auto', dua... tol=0.0001, verbose=0, warm_start=False), X_test=memmap([[ 0.00000000e+00, 1.29659900e+06, 5...000000e+01, 0.00000000e+00, 4.29933458e+01]]), y_test=memmap([0, 1, 0, ..., 1, 1, 1]), scorer=make_scorer(roc_auc_score, needs_threshold=True))
1604 score = scorer(estimator, X_test)
1605 else:
1606 score = scorer(estimator, X_test, y_test)
1607 if not isinstance(score, numbers.Number):
1608 raise ValueError("scoring must return a number, got %s (%s) instead."
-> 1609 % (str(score), type(score)))
1610 return score
1611
1612
1613 def _permutation_test_score(estimator, X, y, cv, scorer):
ValueError: scoring must return a number, got 0.998981811748 (<class 'numpy.core.memmap.memmap'>) instead.
We have made several attempts to reproduce it outside of the context of the application, but are not having any luck. We have made the following change to cross_validation.py and it fixed our particular problem:
...
if isinstance(score, np.core.memmap):
score = np.float(score)
if not isinstance(score, numbers.Number):
raise ValueError("scoring must return a number, got %s (%s) instead."
...
Some more information:
we are on python 2.7
we are using a Pipeline to ensure all inputs are numeric
My questions are the following:
How might we go about reproducing this problem so as to cause the scorer to return a memmap?
Is anyone else having this particular problem?
Is the change we made in cross_validation.py actually a decent solution?
Yes, had a similar case
I fell in love with .memmap-s due to O/S limits on memory allocations and I consider .memmap-s a smart tool for large scale machine-learning, using 'em in .fit()-s and other sklearn methods. ( GridSearchCV() not being yet the case, due to it's adverse effect of pre-allocation of memory on large HyperPARAMETERs' grids with n_jobs = -1 )
How might we ... reproduce ...? As far as I remember, mine case was similar and the change from "ordinary" numpy.ndarray to a numpy.memmap() started these artifacts. So, if you strive to create one such artificially, wrap your data into a .memmap()-ed representation of array and make it be returned, even while containing a single cell of data, instead of a plain number. One shall receive a view into a .memmap()-ed sub-range of generic array representation of that cell.
Is the change ... a decent solution? Well, I have got rid of the .memmap()-ed wrapper by explicitly returning a cell value, by referencing the result's [0] component. An enforced conversion by.float() seems fine.