Tensorflow tf.split() list index out of range? - tensorflow

Here's the codes:
a = tf.constant([1,2,3,4])
b = tf.constant([4])
c = tf.split(a, tf.squeeze(b))
then, it turns out to be wrong:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/jeff/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 1203, in split
num = size_splits_shape.dims[0]
IndexError: list index out of range
But why?

The docs state,
If num_or_size_splits is a tensor, size_splits, then splits value into len(size_splits) pieces. The shape of the i-th piece has the same size as the value except along dimension axis where the size is size_splits[i].
Note that size_splits needs to be slicable.
However when you squeeze(b), because it has only one element in your example, it returns a scalar that has no dimension. A scalar cannot be sliced :
b_ = tf.squeeze(b)
b_[0] # error
Hence your error.

Related

typeerror '<' not supported between instances of 'float' and 'pandas._libs.interval.Interval'

I am sorting data within a dataframe, I am using the data from one column, binning it to intervals of 2 units and then I want a list of each of the intervals created. This is the code:
df = gpd.read_file(file)
final = df[df['i_h100']>=0]
final['H_bins']=pd.cut(x=final['i_h100'], bins=np.arange(0, 120+2, 2))
HBins = list(np.unique(final['H_bins']))
With most of my dataframes this works totally fine but occasionally I get the following error:
Traceback (most recent call last):
File "/var/folders/bg/5n2pqdj53xv1lm099dt8z2lw0000gn/T/ipykernel_2222/1548505304.py", line 1, in <cell line: 1>
HBins = list(np.unique(final['H_bins']))
File "<__array_function__ internals>", line 180, in unique
File "/Users/heatherkay/miniconda3/envs/gpd/lib/python3.9/site-packages/numpy/lib/arraysetops.py", line 272, in unique
ret = _unique1d(ar, return_index, return_inverse, return_counts)
File "/Users/heatherkay/miniconda3/envs/gpd/lib/python3.9/site-packages/numpy/lib/arraysetops.py", line 333, in _unique1d
ar.sort()
TypeError: '<' not supported between instances of 'float' and 'pandas._libs.interval.Interval'
I don't understand why, or how to resolve this.

Vaex Dataframe - Groupby on a calculated field - throws error

I have the referenced vaex dataframe
The column "Amount_INR" is calculated using the other three attributes using the function:
def convert_curr(x,y,z):
c = CurrencyRates()
return c.convert(x, 'INR', y, z)
data_df_usd['Amount_INR'] = data_df_usd.apply(convert_curr,arguments=[data_df_usd.CURRENCY_CODE,data_df_usd.TOTAL_AMOUNT,data_df_usd.SUBSCRIPTION_START_DATE_DATE])
I'm trying to perform a groupby operation using the below code:
data_df_usd.groupby('CONTENTID', agg={'Revenue':vaex.agg.sum('Amount_INR')})
The code throws the below error:
RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/vaex/scopes.py", line 113, in evaluate
result = self[expression]
File "/usr/local/lib/python3.7/dist-packages/vaex/scopes.py", line 198, in __getitem__
raise KeyError("Unknown variables or column: %r" % (variable,))
**KeyError: "Unknown variables or column: 'lambda_function(CURRENCY_CODE, TOTAL_AMOUNT, SUBSCRIPTION_START_DATE_DATE)'"**
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/forex_python/converter.py", line 103, in convert
converted_amount = rate * amount
TypeError: can't multiply sequence by non-int of type 'float'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/usr/local/lib/python3.7/dist-packages/vaex/expression.py", line 1616, in _apply
scalar_result = self.f(*[fix_type(k[i]) for k in args], **{key: value[i] for key, value in kwargs.items()})
File "<ipython-input-7-8cc933ccf57d>", line 3, in convert_curr
return c.convert(x, 'INR', y, z)
File "/usr/local/lib/python3.7/dist-packages/forex_python/converter.py", line 107, in convert
"convert requires amount parameter is of type Decimal when force_decimal=True")
forex_python.converter.DecimalFloatMismatchError: convert requires amount parameter is of type Decimal when force_decimal=True
"""
The above exception was the direct cause of the following exception:
DecimalFloatMismatchError Traceback (most recent call last)
<ipython-input-13-cc7b243be138> in <module>
----> 1 data_df_usd.groupby('CONTENTID', agg={'Revenue':vaex.agg.sum('Amount_INR')})
According to the error screenshot, this does not look like it is related to the groupby. Something is happening with the convert_curr function.
You get the error
TypeError; can't multiply sequence by non-int of type 'float'
See of you can evaluate the Amount_INR in the first place.

TypeError: scatter() got multiple values for argument 'c'

I am trying to do hierarchy clustering on my MFCC array 'signal_mfcc' which is an ndarray with dimensions of (198, 12). 198 audio frames/observation and 12 coefficients/dimensions?
I am using a random threshold of '250' with 'distance' for the criterion as shown below:
thresh = 250
print(signal_mfcc.shape)
clusters = hcluster.fclusterdata(signal_mfcc, thresh, criterion="distance")
With the specified threshold, the output variable 'cluster' is a sequence [1 1 1 ... 1] with the length of 198 or (198,) which I assume points all the data to a single cluster.
Then, I am using pyplot to plot scatter() with the following code:
# plotting
print(*(signal_mfcc.T).shape)
plt.scatter(*np.transpose(signal_mfcc), c=clusters)
plt.axis("equal")
title = "threshold: %f, number of clusters: %d" % (thresh) len(set(clusters)))
plt.title(title)
plt.show()
The output is:
plt.scatter(*np.transpose(signal_mfcc), c=clusters)
TypeError: scatter() got multiple values for argument 'c'
The scatter plot would not show. Any clues to what may went wrong?
Thanks in advance!
From this SO Thread, you can see why you have this error.
Fom the Scatter documentation, c is the 2nd optional argument, and the 4th argument total. This error means that your unpacking on np.transpose(signal_mfcc) returns more than 4 items. And as you define c later on, it is defined twice and it cannot choose which one is correct.
Example :
def temp(n, c=0):
pass
temp(*[1, 2], c=1)
# Traceback (most recent call last):
# File "<stdin>", line 1, in <module>
# TypeError: temp() got multiple values for argument 'c'

df.Change[-1] producing errors.

I'm trying to slice the last value of the series Change from my dataframe df.
The dataframe looks something like this
Change
0 1.000000
1 0.917727
2 1.000000
3 0.914773
4 0.933182
5 0.936136
6 0.957500
14466949 1.998392
14466950 2.002413
14466951 1.998392
14466952 1.974266
14466953 1.966224
When I input the following code
df.Change[0]
df.Change[100]
df.Change[100000]
I'm getting an output, but when I'm input
df.Change[-1]
I'm getting the following error
Traceback (most recent call last):
File "<pyshell#188>", line 1, in <module>
df.Change[-1]
File "C:\Python27\lib\site-packages\pandas\core\series.py", line 601, in __getitem__
result = self.index.get_value(self, key)
File "C:\Python27\lib\site-packages\pandas\indexes\base.py", line 2139, in get_value
tz=getattr(series.dtype, 'tz', None))
File "pandas/index.pyx", line 105, in pandas.index.IndexEngine.get_value (pandas\index.c:3338)
File "pandas/index.pyx", line 113, in pandas.index.IndexEngine.get_value (pandas\index.c:3041)
File "pandas/index.pyx", line 151, in pandas.index.IndexEngine.get_loc (pandas\index.c:3898)
KeyError: -1
Pretty much any negative number I use for slicing is resulting in an error, and I'm not exactly sure why.
Thanks.
There are several ways to do this. What's happening is that pandas has no issues with df.Change[100] because 100 is in its index. -1 is not. You happen to have your index the same as if you were using ordinal positions. To explicitly get ordinal positions, use iloc.
df.Change.iloc[-1]
or
df.Change.values[-1]

NumPy: Check if field exists

I have a structured numpy array:
>>> import numpy
>>> a = numpy.zeros(1, dtype = [('field0', 'i2'), ('field1', 'f4')])
Then I start to retrieve some values. However, I do not know in advance, if my array contains a certain field. Therefore, if I try to reach a non-existing field, I am expectedly getting IndexError:
>>> a[0]['field0']
0
>>> a[0]['field2']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: invalid index
I could of course go with try-except; however, this can potentially mask some errors, as IndexError does not specify, on which level I hit the non-existing index:
>>> try:
... a[9999]['field2']['subfield3']
... except IndexError:
... print('Some index does not exist')
...
Some index does not exist
I also tried to approach numpy arrays as lists, but this does not work:
>>> if 'field0' in a[0]:
... print('yes')
... else:
... print('no')
...
no
Therefore, question: Is there a way to check if a given field exists in a structured numpy array?
You could check .dtype.names or .dtype.fields:
>>> a.dtype.names
('field0', 'field1')
>>> 'field0' in a.dtype.names
True
>>> a.dtype.fields
mappingproxy({'field0': (dtype('int16'), 0), 'field1': (dtype('float32'), 2)})
>>> 'field0' in a.dtype.fields
True