Errors using onehot_encode incorrect input format? - mxnet

I'm trying to use the mx.nd.onehot_encode function, which should be straightforward, but I'm getting errors that are difficult to parse. Here is the example usage I'm trying.
m0 = mx.nd.zeros(15)
mx.nd.onehot_encode(mx.nd.array([0]), m0)
I expect this to return a 15 dim vector (at same address as m0) with only the first element set to 1. Instead I get the error:
src/ndarray/./ndarray_function.h:73: Check failed: index.ndim() == 1 && proptype.ndim() == 2 OneHotEncode only support 1d index.
Neither ndarray is of dimension 2, so why am I getting this error? Is there some other input format I should be using?

It seems that mxnet.ndarray.onehot_encode requires the target ndarray to explicitly have the shape [1, X].
I tried:
m0 = mx.nd.zeros((1, 15))
mx.nd.onehot_encode(mx.nd.array([0]), m0)
It reported no error.

Related

DeprecationWarning: elementwise comparison failed; this will raise an error in the future. While broadcasting large arrays

I'm trying to compare each element of one array to each element in another one. The arrays are fairly large -
arr1.shape = (59913, )
arr2.shape = (988114, )
To do this comparison, I use the following code:
A = np.array(arr1[:])[:, np.newaxis] == np.array(arr1[:])[np.newaxis, :]
np.sum(A)
The strange thing is that when i limit the length of either of arr1 or arr2 - everything works as expected and i get a NxN boolean matrix as A, but when I try to run it for the full arrays - i get the following warning -
DeprecationWarning: elementwise comparison failed; this will raise an error in the future.
and a single output: False
Any help would be highly appreciated.
Rather late, but I just encountered the same error and I found out that it is due to lack of memory.
res = np.where(~((a[:, None] == b).all(-1).any(-1)))
a.shape = (14731, 128) b.shape = (3725, 128)
main__:5: DeprecationWarning: elementwise comparison failed; this will raise an error in the future.
Traceback (most recent call last):
AttributeError: 'bool' object has no attribute 'all'
That's what I got. By reducing the size of b gradually, I realized that at some point starts working properly.
So, I think it is a matter of memory lack and the message displayed is rather misleading.
My goal was to keep rows of a that are not present in b. Found a solution using pandas.

Using numpy for polynomial fit on pandas dataframe

I have a dataframe containing astronomical data:
I'm using statsmodels.formula.api to try to apply a polynomial fit to an dataframe, using columns labelled log_z and U, B, V, and other variables. I've got so far
sources['log_z'] = np.log10(sources.z)
mask = ~np.isnan((B-I)) & ~np.isnan(log_z)
model = ols(formula='(B-I) + np.power((U-R),2) ~ log_z', data = [log_z[mask], (B-I)[mask]]).fit()
but I keep getting
PatsyError: Error evaluating factor: TypeError: list indices must be integers or slices, not str
(B-I) + np.power((U-R),2) ~ log_z
^^^^^^^^^^^^^^^^^
even though I'm passing arrays into the function. I get the same error message (apart from the last line) no matter what arrays I use, or how I format them. Can anyone see what I'm doing wrong?

TypeError: 'TensorShape' object is not callable

I am new to Tensorflow programming , i was digging up some functions and got this error in the snippet :
**with** **tf.Session()** as sess_1:
c = tf.constant(5)
d = tf.constant(6)
e = c + d
print(sess_1.run(e))
print(sess_1.run(e.shape()))
Error found :Traceback (most recent call last):
File "C:/Users/Ashu/PycharmProjects/untitled/Bored.py", line 15, in
print(sess_1.run(e.shape()))
TypeError: 'TensorShape' object is not callable
I didn't found it here so can anyone please clarify this silly doubt as i am new learner.Sorry for any typing mistake !
I have a one more doubt , when i uses simply eval() function it doesn't print anything in pycharm , i had to use it along with print() method. So my doubt is when print() method is used it doesn't print the dtype of the tensor , it simply print the tensor or python object value in pycharm.(Why i am not getting the output in the format like : array([1. , 1.,] , dtype=float32))Is it the Pycharm way to print the tensor in new version or is it something i am doing wrong ? So excited to know the thing behind this , please help and pardon if i am wrong at any place.
One confusing aspect of tensorflow for beginners is there are two types of shape: dynamic shape, given by tf.shape(x), and static shape, given by x.shape (assuming x is a tensor). While they represent the same concept, they are used very differently.
Static shape is the shape of a tensor known at run time. Its a data type in its own right, but it can be converted to a list using as_list().
x = tf.placeholder(shape=(None, 3, 4))
static_shape = x.shape
shape_list = x.shape.as_list()
print(shape_list) # [None, 3, 4]
y = tf.reduce_sum(x, axis=1)
print(y.shape.as_list()) # [None, 4]
During operations, tensorflow tracks static shapes as best it can. In the above example, y's shape was calculated based on the partially known shape of x's. Note we haven't even created a session, but the static shape is still known.
Since the batch size is not known, you can't use the static first entry in calculations.
z = tf.reduce_sum(x) / tf.cast(x.shape.as_list()[0], tf.float32) # ERROR
(we could have divided by x.shape.as_list()[1], since that dimension is known at run-time - but that wouldn't demonstrate anything here)
If we need to use a value which is not known statically - i.e. at graph construction time - we can use the dynamic shape of x. The dynamic shape is a tensor - like other tensors in tensorflow - which is evaluated using a session.
z = tf.reduce_sum(x) / tf.cast(tf.shape(x)[0], tf.float32) # all good!
You can't call as_list on the dynamic shape, nor can you inspect its values without going through a session evaluation.
As stated in the documentation, you can only call a session's run method with tensors, operations, or lists of tensors/operations. Your last line of code calls run with the result of e.shape(), which has type TensorShape. The session can't execute a TensorShape argument, so you're getting an error.
When you call print with a tensor, the system prints the tensor's content. If you want to print the tensor's type, use code like print(type(tensor)).

How to print out the minibatchData's value?

minibatch_size = 5
data = reader.next_minibatch(minibatch_size, input_map={ # fetch minibatch
x: reader.streams.query,
y: reader.streams.slot_labels
})
evaluator = C.eval.Evaluator(loss, progress_printer)
evaluator.test_minibatch(data)
print("labels=", data[y].as_sequences())
I got an error for data[y].as_sequences() saying:
raise ValueError('cannot convert sparse value to sequences '
ValueError: cannot convert sparse value to sequences without the corresponding variable
How do I fix this? What is a variable? What should I put?
data[y].as_sequences(variable=y) should do the trick, but I wouldn't recommend it.
On larger datasets as_sequences and asarray quickly cause out of memory exception to be thrown.
I ended up using this:
true_labels = cntk.ops.argmax(labels_input).eval(minibatch[labels_input]).astype(int)

Pandas: Location of a row with error

I am pretty new to Pandas and trying to find out where my code breaks. Say, I am doing a type conversion:
df['x']=df['x'].astype('int')
...and I get an error "ValueError: invalid literal for long() with base 10: '1.0692e+06'
In general, if I have 1000 entries in the dataframe, how can I find out what entry causes a break. Is there anything in ipdb to output the current location (i.e. where the code broke)? Basically, I am trying to pinpoint what value cannot be converted to Int.
The error you are seeing might be due to the value(s) in the x column being strings:
In [15]: df = pd.DataFrame({'x':['1.0692e+06']})
In [16]: df['x'].astype('int')
ValueError: invalid literal for long() with base 10: '1.0692e+06'
Ideally, the problem can be avoided by making sure the values stored in the
DataFrame are already ints not strings when the DataFrame is built.
How to do that depends of course on how you are building the DataFrame.
After the fact, the DataFrame could be fixed using applymap:
import ast
df = df.applymap(ast.literal_eval).astype('int')
but calling ast.literal_eval on each value in the DataFrame could be slow, which is why fixing the problem from the beginning is the best alternative.
Usually you could drop to a debugger when an exception is raised to inspect the problematic value of row.
However, in this case the exception is happening inside the call to astype, which is a thin wrapper around C-compiled code. The C-compiled code is doing the looping through the values in df['x'], so the Python debugger is not helpful here -- it won't allow you to introspect on what value the exception is being raised from within the C-compiled code.
There are many important parts of Pandas and NumPy written in C, C++, Cython or Fortran, and the Python debugger will not take you inside those non-Python pieces of code where the fast loops are handled.
So instead I would revert to a low-brow solution: iterate through the values in a Python loop and use try...except to catch the first error:
df = pd.DataFrame({'x':['1.0692e+06']})
for i, item in enumerate(df['x']):
try:
int(item)
except ValueError:
print('ERROR at index {}: {!r}'.format(i, item))
yields
ERROR at index 0: '1.0692e+06'
I hit the same problem, and as I have a big input file (3 million rows), enumerating all rows will take a long time. Therefore I wrote a binary-search to locate the offending row.
import pandas as pd
import sys
def binarySearch(df, l, r, func):
while l <= r:
mid = l + (r - l) // 2;
result = func(df, mid, mid+1)
if result:
# Check if we hit exception at mid
return mid, result
result = func(df, l, mid)
if result is None:
# If no exception at left, ignore left half
l = mid + 1
else:
r = mid - 1
# If we reach here, then the element was not present
return -1
def check(df, start, end):
result = None
try:
# In my case, I want to find out which row cause this failure
df.iloc[start:end].uid.astype(int)
except Exception as e:
result = str(e)
return result
df = pd.read_csv(sys.argv[1])
index, result = binarySearch(df, 0, len(df), check)
print("index: {}".format(index))
print(result)
To report all rows which fails to map due to any exception:
df.apply(my_function) # throws various exceptions at unknown rows
# print Exceptions, index, and row content
for i, row in enumerate(df):
try:
my_function(row)
except Exception as e:
print('Error at index {}: {!r}'.format(i, row))
print(e)