Finding tuple inside an array and get its index and value - numpy

I have an array like this
np.array([[(115, 1), 47],
[(115, 2), 1],
[(115, 3), 3],
[(2147482888, 5), 26],
[(275030867, 5), 3]], dtype=object)
How do i get a desired tuple say (115, 1) and get its corresponding
value 47 and update it with addition or subtraction based on certain
conditions.
Lets say i want to get (115, 1) and add 2 to its value 47 + 2 if the array contains (115, 1)
If array does not contain (10009, 10) then add it to the array with default values say 10
Thanks

Arrays are best used for contiguous blocks of homogenous data (n-array of floats for example). It sounds like you want a dict which is key-value pairs. In that case, you can do the following:
# original data
x = np.array([[(115, 1), 47],
[(115, 2), 1],
[(115, 3), 3],
[(2147482888, 5), 26],
[(275030867, 5), 3],
[0, 0]], dtype=object)
d = dict(x)
d[(115,1)] += 2
output:
{(115, 1): 49,
(115, 2): 1,
(115, 3): 3,
(2147482888, 5): 26,
(275030867, 5): 3,
0: 0}

Fast and compact only applies to numeric numpy arrays. With object dtype, speeds and storage are similar to lists (and maybe even slower).
In [39]: arr =np.array([[(115, 1), 47],
...: [(115, 2), 1],
...: [(115, 3), 3],
...: [(2147482888, 5), 26],
...: [(275030867, 5), 3]], dtype=object)
In [40]: arr.shape
Out[40]: (5, 2)
One plus to such an array is that it's easy to access 'columns':
In [41]: arr[:,0]
Out[41]:
array([(115, 1), (115, 2), (115, 3), (2147482888, 5), (275030867, 5)],
dtype=object)
In [42]: arr[:,1]
Out[42]: array([47, 1, 3, 26, 3], dtype=object)
The whole array as a list of lists:
In [43]: alist = arr.tolist()
In [44]: alist
Out[44]:
[[(115, 1), 47],
[(115, 2), 1],
[(115, 3), 3],
[(2147482888, 5), 26],
[(275030867, 5), 3]]
But for finding a tuple, it's easier to work with a list of the tuples:
In [47]: alist = arr[:,0].tolist()
In [48]: alist
Out[48]: [(115, 1), (115, 2), (115, 3), (2147482888, 5), (275030867, 5)]
In [49]: alist.index((115,1))
Out[49]: 0
In [50]: alist.index((115,3))
Out[50]: 2
In [51]: arr[2,:]
Out[51]: array([(115, 3), 3], dtype=object)
That said, your structure, and desired actions do look at lot more like dictionary actions. Finding 'keys' and adding them is not a natural fit for numpy.

Related

How do you flatten a 3d numpy array into a 2-d array of tuples?

I have the following numpy array:
array([[[1, 1], [0, 5]],
[[1, 2], [1, 6]],
[[0, 3], [0, 7]]]
)
Of shape (3,2,2)
I'd like to reshape it into a 3x2 array of tuples. Ie.
array([[(1, 1), (0, 5)],
[(1, 2), (1, 6)],
[(0, 3), (0, 7)]]
)
Is there any way to do this in numpy without a python loop? My actual numpy array is very large.
In [48]: arr = np.array([[[1, 1], [0, 5]],
...: [[1, 2], [1, 6]],
...: [[0, 3], [0, 7]]]
...: )
In [49]: arr
Out[49]:
array([[[1, 1],
[0, 5]],
[[1, 2],
[1, 6]],
[[0, 3],
[0, 7]]])
In [50]: import pandas as pd
In [51]: pd.DataFrame(arr)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
...
ValueError: Must pass 2-d input. shape=(3, 2, 2)
Make a 2d object dtype array - start with a "blank":
In [52]: res = np.empty((3,2),object)
In [53]: res[:] = arr
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [53], in <cell line: 1>()
----> 1 res[:] = arr
ValueError: could not broadcast input array from shape (3,2,2) into shape (3,2)
But if we first make a pure python list:
In [54]: res[:] = arr.tolist()
In [55]: res
Out[55]:
array([[list([1, 1]), list([0, 5])],
[list([1, 2]), list([1, 6])],
[list([0, 3]), list([0, 7])]], dtype=object)
In [56]: pd.DataFrame(res)
Out[56]:
0 1
0 [1, 1] [0, 5]
1 [1, 2] [1, 6]
2 [0, 3] [0, 7]
To make those object elements tuples, we need a list comprehension. This will be easiest if res starts as 1dL
In [64]: res = np.empty((6),object)
In [65]: [tuple(x) for x in arr.reshape(-1,2).tolist()]
Out[65]: [(1, 1), (0, 5), (1, 2), (1, 6), (0, 3), (0, 7)]
In [66]: res[:] = [tuple(x) for x in arr.reshape(-1,2).tolist()]
In [67]: res
Out[67]: array([(1, 1), (0, 5), (1, 2), (1, 6), (0, 3), (0, 7)], dtype=object)
In [68]: res.reshape(3,2)
Out[68]:
array([[(1, 1), (0, 5)],
[(1, 2), (1, 6)],
[(0, 3), (0, 7)]], dtype=object)
Another approach is to make a structured array, whose 'tolist' produces tuples:
In [80]: import numpy.lib.recfunctions as rf
In [81]: rf.unstructured_to_structured(arr, dtype=np.dtype('i,i'))
Out[81]:
array([[(1, 1), (0, 5)],
[(1, 2), (1, 6)],
[(0, 3), (0, 7)]], dtype=[('f0', '<i4'), ('f1', '<i4')])
In [82]: pd.DataFrame(_.tolist())
Out[82]:
0 1
0 (1, 1) (0, 5)
1 (1, 2) (1, 6)
2 (0, 3) (0, 7)
It's shorter but I don't know about speed.

A question about axis in tensorflow.stack (tensorflow= 1.14)

Using tensorflow.stack what does it mean to have axis=-1 ?
I'm using tensorflow==1.14
Using axis=-1 simply means to stack the tensors along the last axis (as per the python list indexing syntax).
Let's take a look at how this looks like using these tensors of shape (2, 2):
>>> x = tf.constant([[1, 2], [3, 4]])
>>> y = tf.constant([[5, 6], [7, 8]])
>>> z = tf.constant([[9, 10], [11, 12]])
The default behavior for tf.stack as described in the documentation is to stack the tensors along the first axis (index 0) resulting in a tensor of shape (3, 2, 2)
>>> tf.stack([x, y, z], axis=0)
<tf.Tensor: shape=(3, 2, 2), dtype=int32, numpy=
array([[[ 1, 2],
[ 3, 4]],
[[ 5, 6],
[ 7, 8]],
[[ 9, 10],
[11, 12]]], dtype=int32)>
Using axis=-1, the three tensors are stacked along the last axis instead, resulting in a tensor of shape (2, 2, 3)
>>> tf.stack([x, y, z], axis=-1)
<tf.Tensor: shape=(2, 2, 3), dtype=int32, numpy=
array([[[ 1, 5, 9],
[ 2, 6, 10]],
[[ 3, 7, 11],
[ 4, 8, 12]]], dtype=int32)>

How to iterate through slices at the last dimension

For example, you have array
a = np.array([[[ 0, 1, 2],
[ 3, 4, 5]],
[[ 6, 7, 8],
[ 9, 10, 11]]])
We want to iterate through slices at the last dimension, i.e. [0,1,2], [3,4,5], [6,7,8], [9,10,11]. Any way to achieve this without the for loop? Thanks!
Tried this but it does not work, because numpy does not interpret the tuple in the way we wanted - a[(0, 0),:] is not the same as a[0, 0, :]
[a[i,:] for i in zip(*product(*(range(ii) for ii in a.shape[:-1])))]
More generally, any way for the last k dimensions? Something equivalent to looping through a[i,j,k, ...].
In [26]: a = np.array([[[ 0, 1, 2],
...: [ 3, 4, 5]],
...:
...: [[ 6, 7, 8],
...: [ 9, 10, 11]]])
In [27]: [a[i,j,:] for i in range(2) for j in range(2)]
Out[27]: [array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8]), array([ 9, 10, 11])]
or
In [31]: list(np.ndindex(2,2))
Out[31]: [(0, 0), (0, 1), (1, 0), (1, 1)]
In [32]: [a[i,j] for i,j in np.ndindex(2,2)]
another
list(a.reshape(-1,3))

How to reshape numpy array of shape (4, 1, 1) into (4, 2, 1)?

Suppose I have a numpy array
arr = np.array([1, 4, 4, 5]).reshape((4, 1, 1))
Now I want to reshape arr into arr1 such that
>>> print(arr1)
[[[1]
[1]]
[[4]
[4]]
[[4]
[4]]
[[5]
[5]]]
>>> arr1.shape
(4, 2, 1)
Please help I really got stuck at this.
In [484]: arr = np.array([1, 4, 4, 5]).reshape((4, 1, 1))
In [485]: np.concatenate([arr,arr],axis=1)
Out[485]:
array([[[1],
[1]],
[[4],
[4]],
[[4],
[4]],
[[5],
[5]]])
In [486]: np.repeat(arr,2,1)
Out[486]:
array([[[1],
[1]],
[[4],
[4]],
[[4],
[4]],
[[5],
[5]]])
Speeds are similar; with a slight edge for the repeat, but not enough to fight over. np.hstack is a concatenate on axis 1.

Matrix multiplication of two vectors

I'm trying to do a matrix multiplication of two vectors in numpy which would result in an array.
Example
In [108]: b = array([[1],[2],[3],[4]])
In [109]: a =array([1,2,3])
In [111]: b.shape
Out[111]: (4, 1)
In [112]: a.shape
Out[112]: (3,)
In [113]: b.dot(a)
ValueError: objects are not aligned
As can be seen from the shapes, the array a isn't actually a matrix. The catch is to define a like this.
In [114]: a =array([[1,2,3]])
In [115]: a.shape
Out[115]: (1, 3)
In [116]: b.dot(a)
Out[116]:
array([[ 1, 2, 3],
[ 2, 4, 6],
[ 3, 6, 9],
[ 4, 8, 12]])
How to achieve the same result when acquiring the vectors as fields or columns of a matrix?
In [137]: mat = array([[ 1, 2, 3],
[ 2, 4, 6],
[ 3, 6, 9],
[ 4, 8, 12]])
In [138]: x = mat[:,0] #[1,2,3,4]
In [139]: y = mat[0,:] #[1,2,3]
In [140]: x.dot(y)
ValueError: objects are not aligned
You are computing the outer product of two vectors. You can use the function numpy.outer for this:
In [18]: a
Out[18]: array([1, 2, 3])
In [19]: b
Out[19]: array([10, 20, 30, 40])
In [20]: numpy.outer(b, a)
Out[20]:
array([[ 10, 20, 30],
[ 20, 40, 60],
[ 30, 60, 90],
[ 40, 80, 120]])
Use 2d arrays instead of 1d vectors and broadcasting with the * ...
In [8]: #your code from above
In [9]: y = mat[0:1,:]
In [10]: y
Out[10]: array([[1, 2, 3]])
In [11]: x = mat[:,0:1]
In [12]: x
Out[12]:
array([[1],
[2],
[3],
[4]])
In [13]: x*y
Out[13]:
array([[ 1, 2, 3],
[ 2, 4, 6],
[ 3, 6, 9],
[ 4, 8, 12]])
It's the similar catch as in the basic example.
Both x and y aren't perceived as matrices but as single dimensional arrays.
In [143]: x.shape
Out[143]: (4,)
In [144]: y.shape
Out[144]: (3,)
We have to add the second dimension to them, which will be 1.
In [171]: x = array([x]).transpose()
In [172]: x.shape
Out[172]: (4, 1)
In [173]: y = array([y])
In [174]: y.shape
Out[174]: (1, 3)
In [175]: x.dot(y)
Out[175]:
array([[ 1, 2, 3],
[ 2, 4, 6],
[ 3, 6, 9],
[ 4, 8, 12]])