Dtype work in FROM but not IMPORT - numpy

I swear I read almost all the "FROM vs IMPORT" questions before asking this.
While going through the NumPy tutorial I was using:
import numpy as np
but ran into trouble when declaring dtype of a matrix like:
a = np.ones((2,3),dtype=int32)
I kept getting "NameError: name 'int32' is not defined." I am using Python v3.2, and am following the tentative tutorial that goes along with it. I used:
from numpy import *
a = ones((2,3),dtype=int32)
Which works. Any insight as to why this is would be much appreciated.
Thank you in advance!

import numpy as np
#this will work because int32 is defined inside the numpy module
a = np.ones((2,3), dtype=np.int32)
#this also works
b = np.ones((2,3), dtype = 'int32')
#python doesn't know what int32 is because you loaded numpy as np
c = np.ones((2,3), dtype=int32)
back to your example:
from numpy import *
#this will now work because python knows what int32 is because it is loaded with numpy.
d = np.ones((2,3), dtype=int32)
I tend to define the type using strings as in array b

Related

Any way to use ":" to access a row for a 1D NumPy array

Import numpy as np
A=np.array([1,2,3])
Is there any way to achive A[1,:], in MATLAB it is fine
If you want to treat your numpy array as 2 dimensional array like in MatLab, you have to tell it explicitly, by creating a new array and using np.newaxis .
import numpy as np
A=np.array([1,2,3])
print(A);
B = A[np.newaxis,:]
print(B)
# Here you go
print(B[0,:])
Test it on Online Python
Side note:
I wrote B[0,:], not B[1,:], because Python array indices are 0-based, not 1-based like MatLab.

MATLAB .mat in Pandas DataFrame to be used in Tensorflow

I have gone days trying to figure this out, hopefully someone can help.
I am uploading a .mat file into python using scipy.io, placing the struct into a dataframe, which will then be used in Tensorflow.
from scipy.io import loadmat
import pandas as pd
import numpy as p
import matplotlib.pyplot as plt
#import TF
path = '/home/anthony/PycharmProjects/Deep_Learning_MATLAB/circuit-data/for tinghao/template1-lib5-eqns-CR-RESULTS-SET1-FINAL.mat'
raw_data = loadmat(path, squeeze_me=True)
data = raw_data['Graphs']
df = pd.DataFrame(data, dtype=int)
df.pop('transferFunc')
print(df.dtypes)
The out put is:
A object
Ln object
types object
nz int64
np int64
dtype: object
Process finished with exit code 0
The struct is (43249x6). Each cell in the 'A' column is a different sized matrix, i.e. 18x18, or 16x16 etc. Each cell in "Ln" is a row of letters each in their own separate cell. Each cell in 'Types' contains 12 columns of numbers, and 'nz' and 'np' i have no issues with.
I want to put all columns into a dataframe, and use column A or LN or Types as the 'Labels' and nz and np as 'features', again i do not have issues with the latter. Can anyone help with this or have some kind of work around.
The end goal is to have tensorflow train on nz and np and give me either a matrix, Ln, or Type.
What type of data is your .mat file of ? Is your application very time critical?
If you can collect all your data in a struct you could give jsonencode a try, make the struct a json file and load it back into python via json (see json documentation on loading data).
Then you can create a pandas dataframe via
pd.df.from_dict()
Of course this would only be a workaround. Still you would have to ensure your data in the MATLAB struct is correctly orderer to be then imported and transferred to a df.
raw_data = loadmat(path, squeeze_me=True)
data = raw_data['Graphs']
graph_labels = pd.DataFrame()
graph_labels['perf'] = raw_data['Objective'][0:1000]
graph_labels['np'] = data['np'][0:1000]
The code above helped out. Its very simple and drawn out, but it got the job done. But, it does not work in tensorflow because tensorflow does not accept this format, and that was my main issue. I have to convert adjacency matrices to networkx graphs, then upload them into stellargraph.

Rolling multidimensional function in pandas

Let's say, I have the following code.
import numpy as np
import pandas as pd
x = pd.DataFrame(np.random.randn(100, 3)).rolling(window=10, center=True).cov()
For each index, I have a 3x3 matrix. I would like to calculate eigenvalues and then some function of those eigenvalues. Or, perhaps, I might want to compute some function of eigenvalues and eigenvectors. The point is that if I take x.loc[0] then I have no problem to compute anything from that matrix. How do I do it in a rolling fashion for all matrices?
Thanks!
You can use the analogous eigenvector/eigenvalue methods in spicy.sparse.linalg.
import numpy as np
import pandas as pd
from scipy import linalg as LA
x = pd.DataFrame(np.random.randn(100, 3)).rolling(window=10, center=True).cov()
for i in range(len(x)):
try:
e_vals,e_vec = LA.eig(x.loc[i])
print(e_vals,e_vec)
except:
continue
If there are no NaN values present then you need not use the try and except instead go for only for loop.

The corresponding ctypes type of a numpy.dtype?

If I have a numpy ndarray with a certain dtype, how do I know what is the corresponding ctypes type?
For example, if I have a ndarray, I can do the following to convert it to a shared array:
import multiprocessing as mp
import numpy as np
import ctypes
x_np = np.random.rand(10, 10)
x_mp = mp.Array(ctypes.c_double, x_np)
However, I have to specify c_double here. It works if I don't specify the exact same type, but I would like to keep the type the same. How should I find out the ctypes type of the ndarray x_np automatically, at least for some common elementary data types?
This is now supported by numpy.ctypeslib.as_ctypes_type(dtype):
import numpy as np
x_np = np.random.rand(10, 10)
np.ctypeslib.as_ctypes_type(x_np.dtype)
Gives ctypes.c_double, as expected.
There is actually a way to do this that's built into Numpy:
x_np = np.random.rand(10, 10)
typecodes = np.ctypeslib._get_typecodes()
typecodes[x_np.__array_interface__['typestr']]
Output:
ctypes.c_double
The caveat is that the np.ctypeslib._get_typecodes function is marked as private (ie it's name starts with _). However, it doesn't seem like its implementation has changed in some time, so you can probably use it fairly reliably.
Alternatively, the implementation of _get_typecodes is pretty short, so you could just also copy the whole function over to your own code:
import ctypes
import numpy as np
def get_typecodes():
ct = ctypes
simple_types = [
ct.c_byte, ct.c_short, ct.c_int, ct.c_long, ct.c_longlong,
ct.c_ubyte, ct.c_ushort, ct.c_uint, ct.c_ulong, ct.c_ulonglong,
ct.c_float, ct.c_double,
]
return {np.dtype(ctype).str: ctype for ctype in simple_types}

Problems while learning cython

I am learning cython to speed up numpy. I wrote a code to see how to optimize numpy array calculation.
The python code is:
from numpy import *
def set_onsite(n):
a=linspace(0,n,n+1)
onsite=zeros([n+1,n+1],float)
for i in range(0,n+1):
onsite[i,i]=a[i]*a[i]
return onsite
Then, I tried to cythonize this code:
import numpy as np
cimport numpy as np
cimport cython
import cython
#cython.boundscheck(False)
#cython.wraparound(False)
#cython.nonecheck(False)
def set_onsite(np.int_t n):
cdef np.ndarray[double,ndim=1,mode='c'] a=np.linspace(0,n,n+1)
cdef np.ndarray[double,ndim=2,mode='c'] onsite=np.empty(n+1,n+1)
cdef np.int_t i
for i in range(0,n+1):
onsite[i,i]=a[i]*a[i]
return onsite
After running setup.py file, I got the .so file. I ran the code %timeit myfile.set_onsite(10000),but IPython showed
TypeError: data type not understood
So could anyone tell me what is going on here?
I checked my code many times but I did not figure out where the problem arises.
The problem has nothing to do with cython; it's just that np.empty expects the first argument to be the shape given as an int or tuple of ints. The second argument is interpreted as the dtype:
In [19]: np.empty(5,5)
TypeError: data type not understood
while np.empty((5,5)) returns an empty array of shape (5,5).
So instead use
cdef np.ndarray[double,ndim=2,mode='c'] onsite=np.empty((n+1,n+1))
Note the double set of parentheses around n+1, n+1. Or, use np.zeros instead of np.empty to make the Cython function match the Python function.
PS: When debugging Python, it is helpful to note not only the error message, but the line that raises the exception:
File "comp.pyx", line 13, in comp.set_onsite (comp.c:1290)
cdef np.ndarray[double,ndim=2,mode='c'] onsite=np.empty(n+1,n+1)
TypeError: data type not understood