Isn't this a row vector? [duplicate] - numpy

I know that numpy array has a method called shape that returns [No.of rows, No.of columns], and shape[0] gives you the number of rows, shape[1] gives you the number of columns.
a = numpy.array([[1,2,3,4], [2,3,4,5]])
a.shape
>> [2,4]
a.shape[0]
>> 2
a.shape[1]
>> 4
However, if my array only have one row, then it returns [No.of columns, ]. And shape[1] will be out of the index. For example
a = numpy.array([1,2,3,4])
a.shape
>> [4,]
a.shape[0]
>> 4 //this is the number of column
a.shape[1]
>> Error out of index
Now how do I get the number of rows of an numpy array if the array may have only one row?
Thank you

The concept of rows and columns applies when you have a 2D array. However, the array numpy.array([1,2,3,4]) is a 1D array and so has only one dimension, therefore shape rightly returns a single valued iterable.
For a 2D version of the same array, consider the following instead:
>>> a = numpy.array([[1,2,3,4]]) # notice the extra square braces
>>> a.shape
(1, 4)

Rather then converting this to a 2d array, which may not be an option every time - one could either check the len() of the tuple returned by shape or just check for an index error as such:
import numpy
a = numpy.array([1,2,3,4])
print(a.shape)
# (4,)
print(a.shape[0])
try:
print(a.shape[1])
except IndexError:
print("only 1 column")
Or you could just try and assign this to a variable for later use (or return or what have you) if you know you will only have 1 or 2 dimension shapes:
try:
shape = (a.shape[0], a.shape[1])
except IndexError:
shape = (1, a.shape[0])
print(shape)

Related

Adding a third dimension to my 2D array in a for loop

I have a for loop that gives me an output of 16 x 8 2D arrays per entry in the loop. I want to stack all of these 2D arrays along the z-axis in a 3D array. This way, I can determine the variance over the z-axis. I have tried multiple commands, such as np.dstack, matrix3D[p,:,:] = ... and np.newaxis both in- and outside the loop. However, the closest I've come to my desired output is just a repetition of the last array stacked on top of each other. Also the dimensions were way off. I need to keep the original 16 x 8 format. By now I'm in a bit too deep and could use some nudge in the right direction!
My code:
excludedElectrodes = [1,a.numberOfColumnsInArray,a.numberOfElectrodes-a.numberOfColumnsInArray+1,a.numberOfElectrodes]
matrixEA = np.full([a.numberOfRowsInArray, a.numberOfColumnsInArray], np.nan)
for iElectrode in range(a.numberOfElectrodes):
if a.numberOfDeflectionsPerElectrode[iElectrode] != 0:
matrixEA[iElectrode // a.numberOfColumnsInArray][iElectrode % a.numberOfColumnsInArray] = 0
for iElectrode in range (a.numberOfElectrodes):
if iElectrode+1 not in excludedElectrodes:
"""Preprocessing"""
# Loop over heartbeats
for p in range (1,len(iLAT)):
# Calculate parameters, store them in right row-col combo (electrode number)
matrixEA[iElectrode // a.numberOfColumnsInArray][iElectrode % a.numberOfColumnsInArray] = (np.trapz(abs(correctedElectrogram[limitA[0]:limitB[0]]-totalBaseline[limitA[0]:limitB[0]]))/(1000))
# Stack all matrixEA arrays along z axis
matrix3D = np.dstack(matrixEA)
This example snippet does what you want, although I suspect your errors have to do more with things not relative to the concatenate part. Here, we use the None keyword in the array to create a new empty dimension (along which we concatenate the 2D arrays).
import numpy as np
# Function does create a dummy (16,8) array
def foo(a):
return np.random.random((16,8)) + a
arrays2D = []
# Your loop
for i in range(10):
# Calculate your (16,8) array
f = foo(i)
# And append it to the list
arrays2D.append(f)
# Stack arrays along new dimension
array3D = np.concatenate([i[...,None] for i in arrays2D], axis = -1)

What is the meaning of `numpy.array(value)`?

numpy.array(value) evaluates to true, if value is int, float or complex. The result seems to be a shapeless array (numpy.array(value).shape returns ()).
Reshaping the above like so numpy.array(value).reshape(1) works fine and numpy.array(value).reshape(1).squeeze() reverses this and again results in a shapeless array.
What is the rationale behind this behavior? Which use-cases exist for this behaviour?
When you create a zero-dimensional array like np.array(3), you get an object that behaves as an array in 99.99% of situations. You can inspect the basic properties:
>>> x = np.array(3)
>>> x
array(3)
>>> x.ndim
0
>>> x.shape
()
>>> x[None]
array([3])
>>> type(x)
numpy.ndarray
>>> x.dtype
dtype('int32')
So far so good. The logic behind this is simple: you can process any array-like object the same way, regardless of whether is it a number, list or array, just by wrapping it in a call to np.array.
One thing to keep in mind is that when you index an array, the index tuple must have ndim or fewer elements. So you can't do:
>>> x[0]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: too many indices for array
Instead, you have to use a zero-sized tuple (since x[] is invalid syntax):
>>> x[()]
3
You can also use the array as a scalar instead:
>>> y = x + 3
>>> y
6
>>> type(y)
numpy.int32
Adding two scalars produces a scalar instance of the dtype, not another array. That being said, you can use y from this example in exactly the same way you would x, 99.99% of the time, since dtypes inherit from ndarray. It does not matter that 3 is a Python int, since np.add will wrap it in an array regardless. y = x + x will yield identical results.
One difference between x and y in these examples is that x is not officially considered to be a scalar:
>>> np.isscalar(x)
False
>>> np.isscalar(y)
True
The indexing issue can potentially throw a monkey wrench in your plans to index any array like-object. You can easily get around it by supplying ndmin=1 as an argument to the constructor, or using a reshape:
>>> x1 = np.array(3, ndmin=1)
>>> x1
array([3])
>>> x2 = np.array(3).reshape(-1)
>>> x2
array([3])
I generally recommend the former method, as it requires no prior knowledge of the dimensionality of the input.
FurtherRreading:
Why are 0d arrays in Numpy not considered scalar?

Sklearn and Sparse Matrices ValueError

I'm aware similar questions have been asked before, and I've tried everything suggested in them, but I'm still stumped. I have a dataset with 2 columns: The first with vectors representing words stored as a 1x10000 sparse csr matrix (so a matrix in each cell), and the second contains integer ratings which I will use for classification. When I run the following code
for index, row in data.iterrows():
print(row)
print(row[0].shape)
I get the correct output for all the rows
Name: 0, dtype: object
(1, 10000)
Vector (0, 0)\t1.0\n (0, 1)\t1.0\n (0, 2)\t1.0\n ...
Rating 5
Now when I try passing my data in any SKlearn classifier like so:
uniform_random_classifier = DummyClassifier(strategy='uniform')
uniform_random_classifier.fit(data["Vectors"], data["Ratings"])
I get the following error:
array = np.array(array, dtype=dtype, order=order, copy=copy)
ValueError: setting an array element with a sequence.
What am I doing wrong? I've made sure all my sparse matrices are the same size and I've tried reshaping my data in various ways, but with no luck, and the Sklearn classifiers are supposed to be able to deal with csr matrices.
Update: Converting the entire "Vectors" column into one large 2-D matrix did the trick, but for completeness sake the following is the code I used to generate my dataframe if anyone is curious and wants to try solving the original issue. Assume data is a pandas dataframe with rows that look like
"560 420 222" 5.0
"2345 2344 2344 5" 3.0
def vectorize(feature, size):
"""Given a numeric string generated from a vocabulary table return a binary vector representation of
each feature"""
vector = sparse.lil_matrix((1, size))
for number in feature.split(' '):
try:
vector[0, int(number) - 1] = 1
except ValueError:
pass
return vector
def vectorize_dataset(data, vectorize, size):
"""Given a dataset in the appropriate "num num num..." format, a specific vectorization format, and a vector size,
returns the dataset in vectorized form"""
result_data = pd.DataFrame(index=range(data.shape[0]), columns=["Vector", "Rating"])
for index, row in data.iterrows():
# All the mixing up of decodings and encoding has made it so that Pandas incorrectly parses EOF chars
if type(row[0]) == type('str'):
result_data.iat[index, 0] = vectorize(row[0], size).tocsr()
result_data.iat[index, 1] = data.loc[index][1]
return result_data

Python: AttributeError: "'numpy.float64' object has no attribute 'tanh'"

I have seen couple of questions with similar title, however I am afraid, none of them could satisfactorily answer my question and that is, how do I take tan inverse or lets say exp of a numpy ndarray? For instance, piece of my code looks similar to this-
import numpy as np
from numpy import ndarray,zeros,array,dot,exp
import itertools
def zetta_G(x,spr_g,theta_g,c_g):
#this function computes estimated g:
#c_g is basically a matrix of dim equal to g and whose elements contains list of centers that describe the fuzzy system for each element of g:
m,n=c_g.shape[0],c_g.shape[1]
#creating an empty matrix of dim mxn to hold regressors:
zetta_g=zeros((m,n),dtype=ndarray)
#creating an empty matrix of dim mxn to hold estimated g:
z_g=np.zeros((m,n),dtype=ndarray)
#for filling rows
for k in range(m):
#for filling columns
for p in range(n):
#container to hold-length being equal to number of inputs(e1,e2,e3 etc)
Mu=[[] for i in range(len(x))]
for i in range(len(x)):
#filling that with number of zeros equal to len of center
Mu[i]=np.zeros(len(c_g[k][p]))
#creating an empty list for holding rules
M=[]
#piece of code for creating rules-all possible combinations
for i in range(len(x)):
for j in range(len(c_g[k][p])):
Mu[i][j]=exp(-.5*((x[i]-c_g[k][p][j])/spr_g[k][p])**2)
b=list(itertools.product(*Mu))
for i in range(len(b)):
M.append(reduce(lambda x,y:x*y,b[i]))
M=np.array(M)
S=np.sum(M)
#import pdb;pdb.set_trace()
zetta_g[k][p]=M/S
z_g[k][p]=dot(M/S,theta_g[k][p])
return zetta_g,z_g
if __name__=='__main__':
x=[1.2,.2,.4]
cg11,cg12,cg13,cg21,cg22,cg23,cg31,cg32,cg33=[-10,-8,-6,-4,-2,0,2,4,6,8,10],[-10,-8,-6,-4,-2,0,2,4,6,8,10],[-10,-8,-6,-4,-2,0,2,4,6,8,10],[-10,-8,-6,-4,-2,0,2,4,6,8,10],[-10,-8,-6,-4,-2,0,2,4,6,8,10],[-12,-9,-6,-3,0,3,6,9,12],[-6.5,-4.5,-2.5,0,2.5,4.5,6.5],[-5,-4,-3,-2,-1,0,1,2,3,4,5],[-3.5,-2.5,-1.5,0,1.5,2.5,3.5]
C,spr_f=array([[-10,-8,-6,-4,-2,0,2,4,6,8,10],[-10,-8,-6,-4,-2,0,2,4,6,8,10],[-10,-8,-6,-4,-2,0,2,4,6,8,10]]),[2.2,2,2.1]
c_g=array([[cg11,cg12,cg13],[cg21,cg22,cg23],[cg31,cg32,cg33]])
spr_g=array([[2,2.1,2],[2.1,2.2,3],[2.5,1,1.5]])
theta_g=np.zeros((c_g.shape[0],c_g.shape[1]),dtype=ndarray)
#import pdb;pdb.set_trace()
N=0
for i in range(c_g.shape[0]):
for j in range(c_g.shape[1]):
length=len(c_g[i][j])**len(x)
theta_g[i][j]=np.random.sample(length)
N=N+(len(c_g[i][j]))**len(x)
zetta_g,z_g=zetta_G(x,spr_g,theta_g,c_g)
#zetta is a function that accepts following args-- x: which is a list of certain dim, spr_g: is a matrix of dimension similar to theta_g and c_g. theta_g and c_g are numpy matrices with lists as individual elements
print(zetta_g)
print(z_g)
inv=np.tanh(z_g)
print(inv)
In [89]: a=np.array([[1],[3],[2]],dtype=np.ndarray)
In [90]: a
Out[90]:
array([[1],
[3],
[2]], dtype=object)
Note that the dtype is object, not ndarray. If the dtype isn't one of the recognized numeric or string types, it is object, a generic pointer, just like the elements of a list.
In [91]: np.tanh(a)
AttributeError: 'int' object has no attribute 'tanh'
np.tanh is trying to delegate the task to the elements of array. Commonly math on object dtype arrays is performed by list like iteration on the elements. It does not do the fast compiled numeric numpy math.
If a is ordinary number array:
In [95]: np.tanh(np.array([[1],[3],[2]]))
Out[95]:
array([[0.76159416],
[0.99505475],
[0.96402758]])
With object dtype arrays, your ability to do numeric calculations is limited. Some things work, others don't. It's hit-or-miss.
Here's a first stab at cleaning up your code; it's not tested.
def zetta_G(x,spr_g,theta_g,c_g):
m,n=c_g.shape[0],c_g.shape[1]
#creating an empty matrix of dim mxn to hold regressors:
zetta_g=zeros((m,n),dtype=object)
#creating an empty matrix of dim mxn to hold estimated g:
z_g=np.zeros((m,n),dtype=object)
#for filling rows
for k in range(m):
#for filling columns
for p in range(n):
#container to hold-length being equal to number of inputs(e1,e2,e3 etc)
Mu = np.zeros((len(x), len(c_g[k,p])))
#creating an empty list for holding rules
for i in range(len(x)):
Mu[i,:]=exp(-.5*((x[i]-c_g[k,p,:])/spr_g[k,p])**2)
# probably can calc Mu without any loop
M = []
b=list(itertools.product(*Mu))
for i in range(len(b)):
M.append(reduce(lambda x,y:x*y,b[i]))
M=np.array(M)
S=np.sum(M)
zetta_g[k,p]=M/S
z_g[k,p]=dot(M/S,theta_g[k,p])
return zetta_g,z_g
Running your code, and adding some .shape displays I see that
z_g is (3,3) and contains just single numbers. So it can be initialed as a plain 2d float array:
z_g=np.zeros((m,n))
theta_g is (3,3), but with variable length array elements
print([i.shape for i in theta_g.flat])
[(1331,), (1331,), (1331,), (1331,), (1331,), (729,), (343,), (1331,), (343,)]
zetta_g matches in shapes
If I change:
x=np.array([1.2,.2,.4])
I can calculate Mu without a loop with:
Mu = exp(-.5*((x[:,None]-np.array(c_g[k,p])[None,:])/spr_g[k,p])**2)
c_g is a (3,3) array with variable length lists; I can vectorize the
((x[i]-c_g[k,p][j])
expression with:
x[:,None]-np.array(c_g[k,p])[None,:]
Not a big time saver here since x has 4 elements and c_g elements are only 7-11 long. But cleaner.
In this running code I don't see a tanh, so I don't know what kinds of arrays are using that.
You set type of array's elements to dtype=np.ndarray. Replace type to, let say, dtype=np.float64 or any numeric type.

combine 2-D arrays of unknown sizes to make one 3-D

I have a function, say peaksdetect(), that will generate a 2-D array of unknown number of rows; I will call it a few times, let's say 3 and I would like to make of these 3 arrays, one 3-D array. Here is my start but it is very complicated with a lot of if statements, so I want to make things simpler if possible:
import numpy as np
dim3 = 3 # the number of times peaksdetect() will be called
# it is named dim3 because this number will determine
# the size of the third dimension of the result 3-D array
for num in range(dim3):
data = peaksdetect(dataset[num]) # generates a 2-D array of unknown number of rows
if num == 0:
3Darray = np.zeros([dim3, data.shape]) # in fact the new dimension is in position 0
# so dimensions 0 and 1 of "data" will be
# 1 and 2 respectively
else:
if data.shape[0] > 3Darray.shape[1]:
"adjust 3Darray.shape[1] so that it equals data[0] by filling with zeroes"
3Darray[num] = data
else:
"adjust data[0] so that it equals 3Darray.shape[1] by filling with zeroes"
3Darray[num] = data
...
If you are counting on having to resize your array, there is very likely not going to be much to be gained by preallocating it. It will probably be simpler to store your arrays in a list, then figure out the size of the array to hold them all, and dump the data into it:
data = []
for num in range(dim3):
data.append(peaksdetect(dataset[num]))
shape = map(max, zip(*(j.shape for j in data)))
shape = (dim3,) + tuple(shape)
data_array = np.zeros(shape, dtype=data[0].dtype)
for j, d in enumerate(data):
data_array[j, :d.shape[0], :d.shape[1]] = d