Related
I'm new to numpy, and I have an interesting observation on the broadcasting. When I'm adding a 3x5 array directly to a 3x1 array, and update the original 3x1 array with the result, there is no broadcasting issue.
import numpy as np
total = np.random.uniform(-1,1, size=(3))[:,np.newaxis]
print(f'init = \n {total}')
for i in range(3):
total = total + np.ones(shape=(3,5))
print(f'total_{i} = \n {total}')
However, if i'm using '+=' operator to increment the 3x1 array with the value of 3x5 array, there is a broadcasting issue. May I know which rule of numpy broadcasting did I violate in the latter case?
total = np.random.uniform(-1,1, size=(3))[:,np.newaxis]
print(f'init = \n {total}')
for i in range(3):
total += np.ones(shape=(3,5))
print(f'total_{i} = \n {total}')
Thank you!
hawkoli1987
according to add function overridden in numpy array,
def add(x1, x2, *args, **kwargs): # real signature unknown; NOTE: unreliably restored from __doc__
"""
add(x1, x2, /, out=None, *, where=True, casting='same_kind', order='K', dtype=None, subok=True[, signature, extobj])
Add arguments element-wise.
Parameters
----------
x1, x2 : array_like
The arrays to be added.
If ``x1.shape != x2.shape``, they must be broadcastable to a common
shape (which becomes the shape of the output).
out : ndarray, None, or tuple of ndarray and None, optional
A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated array is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
add function returns a freshly-allocated array when dimensions of arrays are different.
In python, a=a+b and a+=b aren't absolutly same. + calls __add__ function and += calls __iadd__.
a = np.array([1, 2])
b = np.array([3, 4])
first_id = id(a)
a = a + b
second_id = id(a)
assert first_id == second_id # False
a = np.array([1, 2])
b = np.array([3, 4])
first_id = id(a)
a += b
second_id = id(a)
assert first_id == second_id # True
+= function does not create new objects and updates the value to the same address.
numpy add function updates an existing instance when adding an array of the same dimensions, but returns a new object when adding arrays of different dimensions. So when use += functions, the two functions must have the same dimension because the results of the add function must be updated on the same object.
For example,
a = np.array()
total = np.random.uniform(-1,1, size=(3))[:,np.newaxis]
print(id(total))
for i in range(3):
total += np.ones(shape=(3,1))
print(id(total))
id(total) are all same because add function just updates the instance in same address because dimmension of two arrays are same.
In [29]: arr = np.zeros((1,3))
The actual error message is:
In [30]: arr += np.ones((2,3))
Traceback (most recent call last):
Input In [30] in <cell line: 1>
arr += np.ones((2,3))
ValueError: non-broadcastable output operand with shape (1,3) doesn't match the broadcast shape (2,3)
I read that as say that arr on the left is "non-broadcastable", where as arr+np.ones((2,3)) is the result of broadcasting. The wording may be awkward; it's probably produced in some deep compiled function where it makes more sense.
We get a variant on this when we try to assign an array to a slice of an array:
In [31]: temp = arr + np.ones((2,3))
In [32]: temp.shape
Out[32]: (2, 3)
In [33]: arr[:] = temp
Traceback (most recent call last):
Input In [33] in <cell line: 1>
arr[:] = temp
ValueError: could not broadcast input array from shape (2,3) into shape (1,3)
This is clearer, saying that the RHS (2,3) cannot be put into the LHS (1,3) slot.
Or trying to put the (2,3) into one "row" of arr:
In [35]: arr[0] = temp
Traceback (most recent call last):
Input In [35] in <cell line: 1>
arr[0] = temp
ValueError: could not broadcast input array from shape (2,3) into shape (3,)
arr[0] = arr works because it tries to put a (1,3) into a (3,) shape - that's a workable broadcasting combination.
arr[0] = arr.T tries to put a (3,1) into a (3,), and fails.
Question
What is the difference between P[(n),(m)] and P[[n],[m]]? I had thought they both to select one element, hence a scalar but () and [] produce different shape. A scalar for () and an array for [].
What is the design decisions and thoughts behind this?
import numpy as np
a = np.array(1).reshape(1, -1)
print(a)
---
[[1]]
b = a[
(0),
(0)
]
print(b)
print(b.shape)
print(b.ndim)
---
1
()
0
c = a[
[0],
[0]
]
print(c)
print(c.shape)
print(c.ndim)
---
[1]
(1,)
1
This is because in order to create a tuple with one element only, you need to add a comma.
b = a[(0,), (0,)] returns the same result as c = a[[0],[0]].
You can try this:
print(type((0)))
print(type((0,)))
Output:
<class 'int'>
<class 'tuple'>
If you want to select one element only, you can simply do a[0,0].
I seem to have a problem of argmax getting the right index for my array. It suppose to return a value 0 but I got a value 18. Here is an example:
>>> a = tf.constant([-0.00000000e+00, 1.31838050e-07, 7.86561927e-11,1.95077332e-09, 4.71118966e-09, 2.67971922e-10,3.62677839e-11 ,9.57063651e-10, 3.25077543e-09, 6.84045816e-08, 2.71129057e-08, 4.34358327e-10, 3.01831915e-09, 6.50069998e-09,1.40559550e-10, 4.57989238e-08, 1.42130885e-08, 9.68442881e-10, 8.28957923e-07,6.10620265e-09, 2.63989475e-09])
>>> a.eval()
array([ -0.00000000e+00, 1.31838050e-07, 7.86561927e-11,
1.95077332e-09, 4.71118966e-09, 2.67971922e-10,
3.62677839e-11, 9.57063651e-10, 3.25077543e-09,
6.84045816e-08, 2.71129057e-08, 4.34358327e-10,
3.01831915e-09, 6.50069998e-09, 1.40559550e-10,
4.57989238e-08, 1.42130885e-08, 9.68442881e-10,
8.28957923e-07, 6.10620265e-09, 2.63989475e-09], dtype=float32)
>>> b = tf.argmax(a,0)
>>> b.eval()
>>> 18
a[18]=8.2895792e-07 > a[0]=0
There is no problem, a[18] is the max value in your array, all your numbers are positive...
Strange error from numpy via matplotlib when trying to get a histogram of a tiny toy dataset. I'm just not sure how to interpret the error, which makes it hard to see what to do next.
Didn't find much related, though this nltk question and this gdsCAD question are superficially similar.
I intend the debugging info at bottom to be more helpful than the driver code, but if I've missed something, please ask. This is reproducible as part of an existing test suite.
if n > 1:
return diff(a[slice1]-a[slice2], n-1, axis=axis)
else:
> return a[slice1]-a[slice2]
E TypeError: ufunc 'subtract' did not contain a loop with signature matching types dtype('<U1') dtype('<U1') dtype('<U1')
../py2.7.11-venv/lib/python2.7/site-packages/numpy/lib/function_base.py:1567: TypeError
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> entering PDB >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> py2.7.11-venv/lib/python2.7/site-packages/numpy/lib/function_base.py(1567)diff()
-> return a[slice1]-a[slice2]
(Pdb) bt
[...]
py2.7.11-venv/lib/python2.7/site-packages/matplotlib/axes/_axes.py(5678)hist()
-> m, bins = np.histogram(x[i], bins, weights=w[i], **hist_kwargs)
py2.7.11-venv/lib/python2.7/site-packages/numpy/lib/function_base.py(606)histogram()
-> if (np.diff(bins) < 0).any():
> py2.7.11-venv/lib/python2.7/site-packages/numpy/lib/function_base.py(1567)diff()
-> return a[slice1]-a[slice2]
(Pdb) p numpy.__version__
'1.11.0'
(Pdb) p matplotlib.__version__
'1.4.3'
(Pdb) a
a = [u'A' u'B' u'C' u'D' u'E']
n = 1
axis = -1
(Pdb) p slice1
(slice(1, None, None),)
(Pdb) p slice2
(slice(None, -1, None),)
(Pdb)
I got the same error, but in my case I am subtracting dict.key from dict.value. I have fixed this by subtracting dict.value for corresponding key from other dict.value.
cosine_sim = cosine_similarity(e_b-e_a, w-e_c)
here I got error because e_b, e_a and e_c are embedding vector for word a,b,c respectively. I didn't know that 'w' is string, when I sought out w is string then I fix this by following line:
cosine_sim = cosine_similarity(e_b-e_a, word_to_vec_map[w]-e_c)
Instead of subtracting dict.key, now I have subtracted corresponding value for key
I had a similar issue where an integer in a row of a DataFrame I was iterating over was of type numpy.int64. I got the
TypeError: ufunc 'subtract' did not contain a loop with signature matching types dtype('<U1') dtype('<U1') dtype('<U1')
error when trying to subtract a float from it.
The easiest fix for me was to convert the row using pd.to_numeric(row).
Why is it applying diff to an array of strings.
I get an error at the same point, though with a different message
In [23]: a=np.array([u'A' u'B' u'C' u'D' u'E'])
In [24]: np.diff(a)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-24-9d5a62fc3ff0> in <module>()
----> 1 np.diff(a)
C:\Users\paul\AppData\Local\Enthought\Canopy\User\lib\site-packages\numpy\lib\function_base.pyc in diff(a, n, axis)
1112 return diff(a[slice1]-a[slice2], n-1, axis=axis)
1113 else:
-> 1114 return a[slice1]-a[slice2]
1115
1116
TypeError: unsupported operand type(s) for -: 'numpy.ndarray' and 'numpy.ndarray'
Is this a array the bins parameter? What does the docs say bins should be?
I am fairly new to this myself, but I had a similar error and found that it is due to a type casting issue. I was trying to concatenate rather than take the difference but I think the principle is the same here. I provided a similar answer on another question so I hope that is OK.
In essence you need to use a different data type cast, in my case I needed str not float, I suspect yours is the same so my suggested solution is. I am sorry I cannot test it before suggesting but I am unclear from your example what you were doing.
return diff(str(a[slice1])-str(a[slice2]), n-1, axis=axis)
Please see my example code below for the fix to my code, the change occurs on the third to last line. The code is to produce a basic random forest model.
import scipy
import math
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn import preprocessing, metrics, cross_validation
Data = pd.read_csv("Free_Energy_exp.csv", sep=",")
Data = Data.fillna(Data.mean()) # replace the NA values with the mean of the descriptor
header = Data.columns.values # Ues the column headers as the descriptor labels
Data.head()
test_name = "Test.csv"
npArray = np.array(Data)
print header.shape
npheader = np.array(header[1:-1])
print("Array shape X = %d, Y = %d " % (npArray.shape))
datax, datay = npArray.shape
names = npArray[:,0]
X = npArray[:,1:-1].astype(float)
y = npArray[:,-1] .astype(float)
X = preprocessing.scale(X)
XTrain, XTest, yTrain, yTest = cross_validation.train_test_split(X,y, random_state=0)
# Predictions results initialised
RFpredictions = []
RF = RandomForestRegressor(n_estimators = 10, max_features = 5, max_depth = 5, random_state=0)
RF.fit(XTrain, yTrain) # Train the model
print("Training R2 = %5.2f" % RF.score(XTrain,yTrain))
RFpreds = RF.predict(XTest)
with open(test_name,'a') as fpred :
lenpredictions = len(RFpreds)
lentrue = yTest.shape[0]
if lenpredictions == lentrue :
fpred.write("Names/Label,, Prediction Random Forest,, True Value,\n")
for i in range(0,lenpredictions) :
fpred.write(RFpreds[i]+",,"+yTest[i]+",\n")
else :
print "ERROR - names, prediction and true value array size mismatch."
This leads to an error of;
Traceback (most recent call last):
File "min_example.py", line 40, in <module>
fpred.write(RFpreds[i]+",,"+yTest[i]+",\n")
TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('S32') dtype('S32') dtype('S32')
The solution is to make each variable a str() type on the third to last line then write to file. No other changes to then code have been made from the above.
import scipy
import math
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn import preprocessing, metrics, cross_validation
Data = pd.read_csv("Free_Energy_exp.csv", sep=",")
Data = Data.fillna(Data.mean()) # replace the NA values with the mean of the descriptor
header = Data.columns.values # Ues the column headers as the descriptor labels
Data.head()
test_name = "Test.csv"
npArray = np.array(Data)
print header.shape
npheader = np.array(header[1:-1])
print("Array shape X = %d, Y = %d " % (npArray.shape))
datax, datay = npArray.shape
names = npArray[:,0]
X = npArray[:,1:-1].astype(float)
y = npArray[:,-1] .astype(float)
X = preprocessing.scale(X)
XTrain, XTest, yTrain, yTest = cross_validation.train_test_split(X,y, random_state=0)
# Predictions results initialised
RFpredictions = []
RF = RandomForestRegressor(n_estimators = 10, max_features = 5, max_depth = 5, random_state=0)
RF.fit(XTrain, yTrain) # Train the model
print("Training R2 = %5.2f" % RF.score(XTrain,yTrain))
RFpreds = RF.predict(XTest)
with open(test_name,'a') as fpred :
lenpredictions = len(RFpreds)
lentrue = yTest.shape[0]
if lenpredictions == lentrue :
fpred.write("Names/Label,, Prediction Random Forest,, True Value,\n")
for i in range(0,lenpredictions) :
fpred.write(str(RFpreds[i])+",,"+str(yTest[i])+",\n")
else :
print "ERROR - names, prediction and true value array size mismatch."
These examples are from a larger code so I hope the examples are clear enough.
I think #James is right. I got stuck by same error while working on Polyval(). And yeah solution is to use the same type of variabes. You can use typecast to cast all variables in the same type.
BELOW IS A EXAMPLE CODE
import numpy
P = numpy.array(input().split(), float)
x = float(input())
print(numpy.polyval(P,x))
here I used float as an output type. so even the user inputs the INT value (whole number). the final answer will be typecasted to float.
I ran into the same issue, but in my case it was just a Python list instead of a Numpy array used. Using two Numpy arrays solved the issue for me.
I try to compare the results of some numpy.array calculations with expected results, and I constantly get false comparison, but the printed arrays look the same, e.g:
def test_gen_sine():
A, f, phi, fs, t = 1.0, 10.0, 1.0, 50.0, 0.1
expected = array([0.54030231, -0.63332387, -0.93171798, 0.05749049, 0.96724906])
result = gen_sine(A, f, phi, fs, t)
npt.assert_array_equal(expected, result)
prints back:
> raise AssertionError(msg)
E AssertionError:
E Arrays are not equal
E
E (mismatch 100.0%)
E x: array([ 0.540302, -0.633324, -0.931718, 0.05749 , 0.967249])
E y: array([ 0.540302, -0.633324, -0.931718, 0.05749 , 0.967249])
My gen_sine function is:
def gen_sine(A, f, phi, fs, t):
sampling_period = 1 / fs
num_samples = fs * t
samples_range = (np.arange(0, num_samples) * 2 * f * np.pi * sampling_period) + phi
return A * np.cos(samples_range)
Why is that? How should I compare the two arrays?
(I'm using numpy 1.9.3 and pytest 2.8.1)
The problem is that np.assert_array_equal returns None and does the assert statement internally. It is incorrect to preface it with a separate assert as you do:
assert np.assert_array_equal(x,y)
Instead in your test you would just do something like:
import numpy as np
from numpy.testing import assert_array_equal
def test_equal():
assert_array_equal(np.arange(0,3), np.array([0,1,2]) # No assertion raised
assert_array_equal(np.arange(0,3), np.array([2,0,1]) # Raises AssertionError
Update:
A few comments
Don't rewrite your entire original question, because then it was unclear what an answer was actually addressing.
As far as your updated question, the issue is that assert_array_equal is not appropriate for comparing floating point arrays as is explained in the documentation. Instead use assert_allclose and then set the desired relative and absolute tolerances.