What is the difference between <class 'numpy.ndarray'> and numpy.ndarray? - numpy

I have been doing some calculations using numpy arrays and have arrived at the question of what is the difference between:
<class 'numpy.ndarray'> and numpy.ndarray
I have noticed that the following operation only works on <class 'numpy.ndarray'>:
(arrray == 1).sum()
Why is that so?

There's no difference; they're identical.
numpy.ndarray is the actual type of numpy arrays; <class 'numpy.ndarray'> is the string represention ot numpy.ndarray:
>>> import numpy as np
>>> a = np.array([1, 2, 3])
array([1, 2, 3])
>>> print(type(a) == np.ndarray)
True
>>> np.ndarray
<class 'numpy.ndarray'>
>>> print(type(a))
<class 'numpy.ndarray'>
>>> str(type(a))
"<class 'numpy.ndarray'>"
>>> repr(type(a))
"<class 'numpy.ndarray'>"
Python interpreters such as IPython and Jupyter (which underneath are actually the same thing) will trim of the <class '...' > part and only show the type itself when you enter the type the into interpreter, e.g. ipython:
$ ipython
Python 3.9.9 (main, Nov 21 2021, 03:23:44)
Type 'copyright', 'credits' or 'license' for more information
IPython 8.1.1 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import numpy as np
In [2]: np.ndarray
Out[2]: numpy.ndarray
...versus python (the builtin interpreter):
$ python3
Python 3.9.9 (main, Nov 21 2021, 03:23:44)
[Clang 13.0.0 (clang-1300.0.29.3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> np.ndarray
<class 'numpy.ndarray'>
...but they're the exact same type.

I wonder if you are confusing numpy arrays and python lists. I/we often talk about a numpy array, meaning actually an object of class/type np.ndarray.
In [144]: a = [1, 2, 3] # a list
In [145]: b = np.array(a) # an array
In [146]: type(a), type(b)
Out[146]: (list, numpy.ndarray)
Your expression works with the array, but not the list:
In [147]: (b == 1).sum()
Out[147]: 1
In [148]: (a == 1).sum()
Traceback (most recent call last):
Input In [148] in <module>
(a == 1).sum()
AttributeError: 'bool' object has no attribute 'sum'
In [149]: b == 1
Out[149]: array([ True, False, False])
In [150]: a == 1
Out[150]: False
Note that I created b with np.array(). There is a np.ndarray function, but we don't usually use it - it's a low level creator that most of us don't need. A useful starting page:
https://numpy.org/doc/1.22/user/basics.creation.html

Related

Can not get the shape attribute value of a dataset in tensorflow?

You can check the following code,
import tensorflow as tf
data = tf.data.Dataset.range(10)
tf.print(data)
The output is
<RangeDataset shapes: (), types: tf.int64>
The shape is empty.
Just like python's range, Dataset.range also doesn't return the actual values. Instead it returns a generator-like object called RangeDataset. To get a numpy iterator you need RangeDataset.as_numpy_iterator. Then, you can convert it to a list, just like you would with list(range(5)):
>>> tf.data.Dataset.range(5)
<RangeDataset shapes: (), types: tf.int64>
>>> list(tf.data.Dataset.range(5).as_numpy_iterator())
[0, 1, 2, 3, 4]
>>> range(5)
range(0, 5)
>>> list(range(5))
[0, 1, 2, 3, 4]
For more examples of its usage, you can see the documentation

AttributeError: 'Series' object has no attribute 'pipe'

I am getting the following error when I run the python3 keras code on my ec2 instance. The code works fine on Azure Jupyter Notebook.
Code:
numpy.random.seed(7)
dataframe = pandas.read_csv("some_data.csv", header = None)
df = dataframe
char_cols = df.dtypes.pipe(lambda x: x[x == 'object']).index
for c in char_cols:
df[c] = pandas.factorize(df[c])[0]
dataframe = df
Error:
Traceback (most recent call last):
File "pi_8_1st_year.py", line 12, in <module>
char_cols = df.dtypes.pipe(lambda x: x[x == 'object']).index
File "/usr/lib/python3/dist-packages/pandas/core/generic.py", line 1815, in __getattr__
(type(self).__name__, name))
AttributeError: 'Series' object has no attribute 'pipe'
My configuration:
ubuntu#ipxxxx:~$ python3
Python 3.4.3 (default, Nov 28 2017, 16:41:13)
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas
_frozen_importlib:321: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility
/usr/lib/python3.4/importlib/_bootstrap.py:321: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility
return f(*args, **kwds)
>>> pandas.__version__
'0.13.1'
>>> import numpy
>>> numpy.__version__
'1.15.0'
>>> import sklearn
/usr/lib/python3.4/importlib/_bootstrap.py:321: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility
return f(*args, **kwds)
/usr/lib/python3.4/importlib/_bootstrap.py:321: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility
return f(*args, **kwds)
>>> sklearn.__version__
'0.19.2'
>>> import keras
Using TensorFlow backend.
/usr/lib/python3.4/importlib/_bootstrap.py:321: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility
return f(*args, **kwds)
>>> keras.__version__
'2.2.2'
>>>
As the error suggests you can't use pipe on series object. df.dtypes returns an series object, hence the error.
If you want to find columns of object types you can do it by:
s = (df.dtypes == 'object')
cols = s[s.values == True].index

Cython self defined data types ndarray

I've created a dtype for my np.ndarrays:
particle_t = np.dtype([
('position', float, 2),
('momentum', float, 2),
('velocity', float, 2),
('force', float, 2),
('charge', int, 1),
])
According to the official examples one can call:
def my_func(np.ndarray[dtype, dim] particles):
but when I try to compile:
def tester(np.ndarray[particle_t, ndim = 1] particles):
I get the Invalid type error. Another possibility of usage I've seen is with the memory view like int[:]. Trying def tester(particle_t[:] particles): results in:
'particle_t' is not a type identifier.
How can I fix this?
Obviously, particle_t is not a type but a Python-object as far as Cython is concerned.
It is similar to np.int32 being a Python-object and thus
def tester(np.ndarray[np.int32] particles): #doesn't work!
pass
won't work, you need to use the corresponding type, i.e. np.int32_t:
def tester(np.ndarray[np.int32_t] particles): #works!
pass
But what is the corresponding type for particle_t? You need to create a packed struct, which would mirror your numpy-type. Here is a simplified version:
#Python code:
particle_t = np.dtype([
('position', np.float64, 2), #It is better to specify the number of bytes exactly!
('charge', np.int32, 1), #otherwise you might be surprised...
])
and corresponding Cython code:
%%cython
import numpy as np
cimport numpy as np
cdef packed struct cy_particle_t:
np.float64_t position_x[2]
np.int32_t charge
def tester(np.ndarray[cy_particle_t, ndim = 1] particles):
print(particles[0])
Not only does it compile and load, it also works as advertised:
>>> t=np.zeros(2, dtype=particle_t)
>>> t[:]=42
>>> tester(t)
{'charge': 42, 'position_x': [42.0, 42.0]}

Convert numpy ndarray to non-numpy datatype

I'm trying to convert an element of an np.ndarray to a native integer type.
>>> x = np.array([1, 2, 2.5])
>>> type(x[0])
<type 'numpy.float64'>
>>> type(x.astype(int)[0])
<type 'numpy.int64'>
What I'd like is:
>>> type(x.astype('something here')[0])
<type 'int'>
This is the original question, asked in a pandas context, but it turns out to boil down to a problem with np.ndarray.astype()
astype(int) maintains the numpy-ness of the integers in a Series:
>>> s = pd.Series([1,2,3])
>>> type(s[0])
<type 'numpy.int64'>
>>> type(s[0].astype(int))
<type 'numpy.int64'>
Is there anyway to cast a series, or even just one element of a series as a native datatype such that the following could be achieved?
>>> type(s[0].dosomething())
<type 'int'>
Why am I asking this?
I'm trying to export a pandas.DataFrame to GEXF format using networkx.write_gexf().
The exporter insists that all data used respond to type(x) with either int, float, bool or a few others. It doesn't know what to do with a numpy.int64.
Based on the comments, it might turn out that you don't need this, but to answer the immediate question, you can use the item method. For example:
In [78]: x = np.array([1.0, 2.0, 3.0])
In [79]: x.dtype
Out[79]: dtype('float64')
In [80]: x.item(0)
Out[80]: 1.0
In [81]: type(x.item(0))
Out[81]: float
In [82]: y = np.array([1, 2, 3], dtype=np.int32)
In [83]: type(y.item(0))
Out[83]: int
In [84]: type(y[0])
Out[84]: numpy.int32
To convert the entire array at once, the tolist method converts the elements to the nearest compatible Python type:
In [95]: xlist = x.tolist()
In [96]: xlist
Out[96]: [1.0, 2.0, 3.0]
In [97]: type(xlist[0])
Out[97]: float
In [98]: ylist = y.tolist()
In [99]: ylist
Out[99]: [1, 2, 3]
In [100]: type(ylist[0])
Out[100]: int

How to properly use item_loader in Scrapy? Item_loader class not working properly

I pass scrapy items via the meta property to a callback function like so:
def parse_VendorDetails(self, response):
loader = XPathItemLoader(response.meta['item'], response=response)
print loader.get_collected_values('testField') <-- returns empty value
print response.meta['item']['testField'] <-- return expected value
the first print outputs an empty list. the second print returns the value as expected.
What could be the reason for this?
There is currently a bug in ItemLoaders,get_output_value() and get_collected_values() ignore item parameters and only look at ItemLoader._values so there is some inconsistent behavior in these methods in that loaded data via load_item() is not returned:
>>> from scrapy.contrib.loader import ItemLoader
>>> il = ItemLoader(response=response, item=dict(foo=1))
>>> il.add_value('bar', 3)
>>> il._values
defaultdict(<type 'list'>, {'bar': [3]})
>>> il.item
{'foo': 1}
>>> il.get_output_value('foo')
[]
>>> il.get_output_value('bar')
[3]
>>> il.get_collected_values('foo')
[]
>>> il.get_collected_values('bar')
[3]
You can install the proposed patch or just not use get_collected_values. If you install the patch you can use the values parameter with this patch we experience a more sane result:
>>> from scrapy.contrib.loader import ItemLoader
>>> il = ItemLoader(response=response, item={}, values=dict(foo=1))
>>> il.add_value('bar', 3)
>>> il._values
defaultdict(<type 'list'>, {'foo': [1], 'bar': [3]})
>>> il.item
{}
>>> il.get_output_value('foo')
[1]
>>> il.get_output_value('bar')
[3]
>>> il.get_collected_values('foo')
[1]
>>> il.get_collected_values('bar')
[3]