Pandas dataframe to 2D numpy array - pandas

I have the following dataframe:
d = {'histogram' : [[1,2],[3,4],[5,6]]}
df = pd.DataFrame(d)
The length of the histograms are always the same (2 in this example case).
and I would like to convert the 'histogram' column into a 2D numpy array to feed into a neural net. The preferred output is:
output_array = np.array(d["histogram"])
i.e.:
array([[1, 2],
[3, 4],
[5, 6]])
however when I try:
df["histogram"].to_numpy()
the results is an array of lists instead of numpy array of arrays:
array([list([1, 2]), list([3, 4]), list([5, 6])], dtype=object)
this is problematic for neural nets as I have to specify the dimensions/shape.
I try to solve the issue by casting as numpy array:
df["histogram_arrays"] = df["histogram"].apply(lambda x: np.array(x))
df["histogram_arrays"].to_numpy()
which returns a 1D array of arrays and not the 2D array.
array([array([1, 2]), array([3, 4]), array([5, 6])], dtype=object)
How can I get the histograms into a 2D array?

Try this:
np.vstack(df['histogram'])

Your question is essentially: how do I convert a NumPy array of (identically-sized) lists to a two-dimensional NumPy array.
That makes it a (near) duplicate of this SO question, but since your actual question is somewhat hidden, I'll put an answer here anyway.
Use numpy.vstack:
>>> data = df['histogram'].to_numpy()
>>> data
array([list([1, 2]), list([3, 4]), list([5, 6])], dtype=object)
>>> data = np.vstack(data)
>>> data.dtype, data.shape
(dtype('int64'), (3, 2))
>>> data
array([[1, 2],
[3, 4],
[5, 6]])

Related

Converting sparse tensor dense shape into an integer value in tensorflow

If I want to get the shape of a normal tensor in tensorflow, and store the values in a list, I would use the following
a_shape=[a.shape[0].value , a.shape[1].value]
If I'm not mistaken, using .value converts the element in the tensor to a real number.
With sparse tensors, I type the following
a_sparse_shape=[a.dense_shape[0].value, a.dense_shape[1].value]
However, I get the error message
" 'Tensor' object has no attribute 'value' "
Does anyone have any alternate solutions?
Yes, there is an alternative:
import tensorflow as tf
tensor = tf.random_normal([2, 2, 2, 3])
tensor_shape = tensor.get_shape().as_list()
print(tensor_shape)
# [2, 2, 2, 3]
Same for sparse tensors:
sparse_tensor = tf.SparseTensor(indices=[[0,0], [1, 1]],
values=[1, 2],
dense_shape=[2, 2])
sparse_tensor_shape = sparse_tensor.get_shape().as_list()
print(sparse_tensor_shape)
# [2, 2]

Repeat element from a 2D matrix to a 3D matrix with numpy

I have a 2-D numpy matrix, an example
M = np.matrix([[1,2],[3,4],[5,6]])
I would like, starting from M, to have a matrix like:
M = np.matrix([[[1,2],[1,2],[1,2]],[[3,4],[3,4],[3,4]],[[5,6],[5,6],[5,6]]])
thus, the new matrix has 3 dimensions. How can I do?
NumPy matrix class can't hold 3D data. So, assuming you are okay with NumPy array as output, we can extend the array version of it to 3D with None/np.newaxis and then use np.repeat -
np.repeat(np.asarray(M)[:,None],3,axis=1)
Sample run -
In [233]: M = np.matrix([[1,2],[3,4],[5,6]])
In [234]: np.repeat(np.asarray(M)[:,None],3,axis=1)
Out[234]:
array([[[1, 2],
[1, 2],
[1, 2]],
[[3, 4],
[3, 4],
[3, 4]],
[[5, 6],
[5, 6],
[5, 6]]])
Alternatively, with np.tile -
np.tile(np.asarray(M),3).reshape(-1,3,M.shape[-1])
This should work for you:
np.array([list(np.array(i)) * 3 for i in M])
as another answerer already said, the matrix can't be three-dimensional.
instead of it, you can make 3-dimensional np.array like below.
import numpy as np
M = np.matrix([[1,2],[3,4],[5,6]])
M = np.array(M)
M = np.array([ [x, x, x] for x in M])
M

Normalize numpy ndarray data

My data is numpy ndarray with shape(2,3,4) following this:
I've try to normalize 0-1 scale for each column through sklearn normalization.
from sklearn.preprocessing import normalize
x = np.array([[[1, 2, 3, 4],
[2, 2, 3, 4],
[3, 2, 3, 4]],
[[4, 2, 3, 4],
[5, 2, 3, 4],
[6, 2, 3, 4]]])
x.shape ==> ( 2,3,4)
x = normalize(x, norm='max', axis=0, )
However, I catch the error :
ValueError: Found array with dim 3. the normalize function expected <= 2.
How do I solve this problem?
Thank you.
It seems scikit-learn expects ndarrays with at most two dims. So, to solve it would be to reshape to 2D, feed it to normalize that gives us a 2D array, which could be reshaped back to original shape -
from sklearn.preprocessing import normalize
normalize(x.reshape(x.shape[0],-1), norm='max', axis=0).reshape(x.shape)
Alternatively, it's much simpler with NumPy that works fine with generic ndarrays -
x/np.linalg.norm(x, ord=np.inf, axis=0, keepdims=True)

Tensorflow: index per row

Suppose I have a Tensor of shape (100,20). Now I also have a Tensor of indices of shape (100,). How to obtain now a Tensor of shape (100,) or (100,1) with per row (100 rows) the right value (selected by the corresponding index in indices?
Small example:
So let's say tensor A is
[1, 2, 3]
[4, 5, 6]
[7, 8, 9]
and tensor B is
[0,2,1]
then I want as output
[1,6,8]
You can join your B tensor with an appropriate range to create two-dimensional indices (in your example [[0, 0], [1, 2], [2, 1]]) and then extract the elements using tf.gather_nd:
b_2 = tf.expand_dims(b, 1)
range = tf.expand_dims(tf.range(tf.shape(b)[0]), 1)
ind = tf.concat(1, [range, b_2])
res = tf.gather_nd(a, ind)

How to flatten along 3rd dimension in numpy?

I have a 3d array in numpy that I want to flatten into a 1d array. I want to flatten each 2d "layer" of the array, copying each successive layer into the 1d array.
e.g., for an array with arr[:, :, 0] = [[1, 2], [3, 4]] and arr[:, :, 1] = [[5, 6], [7, 8]], I want the output to be [1, 2, 3, 4, 5, 6, 7, 8].
Currently I have the following code:
out = np.empty(arr.size)
for c in xrange(arr.shape[2]):
layer = arr[:, :, c]
out[c * layer.size:(c + 1) * layer.size] = layer.ravel()
Is there a way to accomplish this efficiently in numpy (without using a for loop)? I have tried messing around with reshape, transpose, and flatten to no avail.
I figured it out:
out = arr.transpose((2, 0, 1)).flatten()
Or (the last axe will be first) : np.rollaxis(a,-1).ravel()