Convert multidimensional array elements into same number of arrays - pandas

I am doing a Computer Vision project in which I am getting an error 'setting an array element with a sequence' when I am trying to change the data type of input image matrix.
I realized this is happening because the input image matrix I am having does not have the same number of elements in all of its array. Is there any way I can convert that input image into the 2D array with the same number of elements in each of its array?
I am getting an error when I am trying to execute the following line:
X_train = X_train.astype('float32')
Any help would be appreciated.
Cheers.

You need to pad the rows with less elements with zeros to make their lengths equal to the length of the longest array (or list) in the list of lists (matrix).
Below's a code snippet to pad a list of lists of unequal lengths to a matrix of same row-lengths:
import numpy as np
unpadded_matrix = np.array([[1, 2], [3, 4, 5], [6, 7, 8, 9]])
max_len = max([len(row) for row in unpadded_matrix])
np.array([row + [0]*(max_len-len(row)) for row in unpadded_matrix])
o/p:
array([[1, 2, 0, 0],
[3, 4, 5, 0],
[6, 7, 8, 9]])

Related

Numpy Interpolation for Array of Arrays

I have an array of arrays that I want to interpolate based on each array's min and max.
For a simple mxn array , with values ranging from 0 to 1, I can do this as follows:
x_inp=np.interp(x,(x.min(),x.max()),(0,0.7))
This suppresses every existing value to 0 to 0.7. However, if I have an array of dimension 100xmxn, the above method considers the global min/max and not the individual min/max of each of the mxn array.
Edit:
For example
x1=np.random.randint(0,5, size=(2, 4))
x2=np.random.randint(6,10, size=(2, 4))
my_list=[x1,x2]
my_array=np.asarray(my_list)
print(my_array)
>> array([[[1, 4, 3, 4],
[3, 2, 0, 0]],
[9, 6, 8, 6],
8, 7, 6, 7]]])
my_array is now of dimension 2x2x4 and my_array.min() and my_array.max() would give me 0 and 9. So If I interpolate, it won't work based on the min/max of the individual 2x4 arrays. What I want is, to have the interpolation work based on min/max of 0/4 for the 1st array and 6/9 for the second.

What is the difference between np.array([val1, val2]) and np.array([[val1, val2]])?

What is the difference between np.array([1, 2]) and np.array([[1, 2]])?
Which one of them is a matrix?
I also do not understand the output for shape of the above tensors. The former returns (2,) and the latter returns (1,2).
np.array([1, 2]) builds an array starting from a list, thus giving you a 1D array with the shape (2, ) since it only contains a single list of two elements.
When using the double [ you are actually passing a list of lists, thus this gets you a multidimensional array, or matrix, with the shape (1, 2).
With the latter you are able to build more complex matrices like:
np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
rendering a 3x3 matrix:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])

Assign numpy matrix to pandas columns

I have dataframe with 48870 rows and calculated embeddings with shape (48870, 768)
I wanna assign this embeddings to padnas column
When i try
test['original_text_embeddings'] = embeddings
I have an error: Wrong number of items passed 768, placement implies 1
I know if a make something like df.loc['original_text_embeddings'] = embeddings[0] will work but i need to automate this process
A dataframe/column needs a 1d list/array:
In [84]: x = np.arange(12).reshape(3,4)
In [85]: pd.Series(x)
...
ValueError: Data must be 1-dimensional
Splitting the array into a list (of arrays):
In [86]: pd.Series(list(x))
Out[86]:
0 [0, 1, 2, 3]
1 [4, 5, 6, 7]
2 [8, 9, 10, 11]
dtype: object
In [87]: _.to_numpy()
Out[87]:
array([array([0, 1, 2, 3]), array([4, 5, 6, 7]), array([ 8, 9, 10, 11])],
dtype=object)
Your embeddings have 768 columns, which would translate to equally 768 columns in a data frame. You are trying to assign all columns from the embeddings to just one column in the data frame, which is not possible.
What you could do is generating a new data frame from the embeddings and concatenate the test df with the embedding df
embedding_df = pd.DataFrame(embeddings)
test = pd.concat([test, embedding_df], axis=1)
Have a look at the documentation for handling indexes and concatenating on different axis:
https://pandas.pydata.org/docs/reference/api/pandas.concat.html

How to check if my data is one-hot encoded

If I have a data matrix, how do I check if the categorical variables have been one-hot encoded or not?
I need to use LIME to explain my prediction, and I read that LIME works only if you have category labels instead of one-hot encoded columns.
I found code to convert it, but it works only if it has been encoded otherwise the columns get turned to NaNs.
So I need e piece of code that looks at a numpy array with data and tells me if it has been one hot encoded or not.
You can sum all the rows, and see if you get a all 1's array, as in the following example:
Example:
X = np.array(
[
[1, 0, 0],
[0, 1, 0],
[0, 0, 1],
[0, 1, 0],
[1, 0, 0]
]
)
print(f'X is one-hot-encoded: {(X.sum(axis=1)-np.ones(X.shape[0])).sum()==0}')
Result:
X is one-hot-encoded: True

How do I swap tensor's axes in TensorFlow?

I have a tensor of shape (30, 116, 10), and I want to swap the first two dimensions, so that I have a tensor of shape (116, 30, 10)
I saw that numpy as such a function implemented (np.swapaxes) and I searched for something similar in tensorflow but I found nothing.
Do you have any idea?
tf.transpose provides the same functionality as np.swapaxes, although in a more generalized form. In your case, you can do tf.transpose(orig_tensor, [1, 0, 2]) which would be equivalent to np.swapaxes(orig_np_array, 0, 1).
It is possible to use tf.einsum to swap axes if the number of input dimensions is unknown. For example:
tf.einsum("ij...->ji...", input) will swap the first two dimensions of input;
tf.einsum("...ij->...ji", input) will swap the last two dimensions;
tf.einsum("aij...->aji...", input) will swap the second and the third
dimension;
tf.einsum("ijk...->kij...", input) will permute the first three dimensions;
and so on.
You can transpose just the last two axes with tf.linalg.matrix_transpose, or more generally, you can swap any number of trailing axes by working out what the leading indices are dynamically, and using relative indices for the axes you want to transpose
x = tf.ones([5, 3, 7, 11])
trailing_axes = [-1, -2]
leading = tf.range(tf.rank(x) - len(trailing_axes)) # [0, 1]
trailing = trailing_axes + tf.rank(x) # [3, 2]
new_order = tf.concat([leading, trailing], axis=0) # [0, 1, 3, 2]
res = tf.transpose(x, new_order)
res.shape # [5, 3, 11, 7]