What is the difference between the following matrix? - numpy

I have a piece of code like the following. I have to implement image2vector() that takes an input of shape (length, height, 3) and returns a vector of shape (length*height*3). It doesn't give me a result of what I expect. Actually, I don't understand the difference between the result which I got and the expected one.
def image2vector(image):
v = None
v = image.reshape(1, 9, image.shape[0] * image.shape[1] * image.shape[2])
return v
image = np.array([[[ 0.67826139, 0.29380381],
[ 0.90714982, 0.52835647],
[ 0.4215251 , 0.45017551]],
[[ 0.92814219, 0.96677647],
[ 0.85304703, 0.52351845],
[ 0.19981397, 0.27417313]],
[[ 0.60659855, 0.00533165],
[ 0.10820313, 0.49978937],
[ 0.34144279, 0.94630077]]])
print ("image2vector(image) = " + str(image2vector(image)))
I got te following result:
image2vector(image) = [[ 0.67826139 0.29380381 0.90714982 0.52835647 0.4215251 0.45017551
0.92814219 0.96677647 0.85304703 0.52351845 0.19981397 0.27417313
0.60659855 0.00533165 0.10820313 0.49978937 0.34144279 0.94630077]]
But I want to get the following one:
[[ 0.67826139] [ 0.29380381] [ 0.90714982] [ 0.52835647] [ 0.4215251 ] [ 0.45017551] [ 0.92814219] [ 0.96677647] [ 0.85304703] [ 0.52351845] [ 0.19981397] [ 0.27417313] [ 0.60659855] [ 0.00533165] [ 0.10820313] [ 0.49978937] [ 0.34144279] [ 0.94630077]]
What is the difference between them? How I get the second matrix from the first one?

Your image does not have the shape (length, height, 3)
In [1]: image = np.array([[[ 0.67826139, 0.29380381],
...: [ 0.90714982, 0.52835647],
...: [ 0.4215251 , 0.45017551]],
...:
...: [[ 0.92814219, 0.96677647],
...: [ 0.85304703, 0.52351845],
...: [ 0.19981397, 0.27417313]],
...:
...: [[ 0.60659855, 0.00533165],
...: [ 0.10820313, 0.49978937],
...: [ 0.34144279, 0.94630077]]])
In [2]: image.shape
Out[2]: (3, 3, 2)
and you can't do the reshape you try:
In [3]: image.reshape(1, 9, image.shape[0] * image.shape[1] * image.shape[2])
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-3-aac5649a99ea> in <module>
----> 1 image.reshape(1, 9, image.shape[0] * image.shape[1] * image.shape[2])
ValueError: cannot reshape array of size 18 into shape (1,9,18)
It has only 18 elements; you can't increase the number of elements with reshape.
In [4]: image.reshape(1, image.shape[0] * image.shape[1] * image.shape[2])
Out[4]:
array([[0.67826139, 0.29380381, 0.90714982, 0.52835647, 0.4215251 ,
0.45017551, 0.92814219, 0.96677647, 0.85304703, 0.52351845,
0.19981397, 0.27417313, 0.60659855, 0.00533165, 0.10820313,
0.49978937, 0.34144279, 0.94630077]])
In [5]: _.shape
Out[5]: (1, 18)
The apparently desired shape is:
In [6]: image.reshape(image.shape[0] * image.shape[1] * image.shape[2],1)
Out[6]:
array([[0.67826139],
[0.29380381],
[0.90714982],
[0.52835647],
...
[0.94630077]])
In [7]: _.shape
Out[7]: (18, 1)

The difference if you want just a vector array, or you want a row or column vector.
usually column vector "vertical vector" has the shape(n,1) and row vector "horizontal" has the shape (1,n)
import numpy as np
image = np.array([[[ 0.67826139, 0.29380381],
[ 0.90714982, 0.52835647],
[ 0.4215251 , 0.45017551]],
[[ 0.92814219, 0.96677647],
[ 0.85304703, 0.52351845],
[ 0.19981397, 0.27417313]],
[[ 0.60659855, 0.00533165],
[ 0.10820313, 0.49978937],
[ 0.34144279, 0.94630077]]])
reshapedImage = image.reshape(18,1)
reshapedImage
array([[0.67826139],
[0.29380381],
[0.90714982],
[0.52835647],
[0.4215251],
[0.45017551],
[0.92814219],
[0.96677647],
[0.85304703],
[0.52351845],
[0.19981397],
[0.27417313],
[0.60659855],
[0.00533165],
[0.10820313],
[0.49978937],
[0.34144279],
[0.94630077]], dtype=object)

Related

Tensorflow Dataset operation equal to timeseries_dataset_from_array possible?

I want some more control over the TensorFlow dataset generation. For this reason, I want to mirror the behavior of timeseries_dataset_from_array but with the ability to use consecutive windows or non-overlapping windows (not possible with timeseries_dataset_from_array to set sequence_stride=0).
# df_with_inputs = (x, 19) df_with_labels = (x,1)
ds = tf.data.Dataset.from_tensor_slices((df_with_inputs.values, df_with_labels.values)).window(20, shift=1, stride=1, drop_remainder=True).batch(32)
equals to:
ds = tf.keras.preprocessing.timeseries_dataset_from_array(df_with_inputs[df_with_inputs.columns], df_with_labels[df_with_labels.columns], sequence_length=window_size,sequence_stride=1,shuffle=False,batch_size=batch_size)
both create a BatchDataset with the same amount of samples, but the type-spec of the dataset with the manual method is somehow different, e.g., first, give me:
<BatchDataset shapes: (DatasetSpec(TensorSpec(shape=(19,), dtype=tf.float32, name=None), TensorShape([None])), DatasetSpec(TensorSpec(shape=(1,), dtype=tf.float32, name=None), TensorShape([None]))), types: (DatasetSpec(TensorSpec(shape=(19,), dtype=tf.float32, name=None), TensorShape([None])), DatasetSpec(TensorSpec(shape=(1,), dtype=tf.float32, name=None), TensorShape([None])))>
where the last one give me:
<BatchDataset shapes: ((None, None, 19), (None, 1)), types: (tf.float64, tf.int32)>
. But both contain the same amount of elements, in my case, 3063. Note that stride and sequence_stride have different behavior in both methods (for the same behavior, you need shift=1). Additionally, when I try to feed the first to my NN, I receive the following error (where the ds of timeseries_dataset_from_array works like a charm):
TypeError: Inputs to a layer should be tensors.
Any idea what I am missing here?
My model:
input_shape = (window_size, num_features) #(20,19)
model = tf.keras.Sequential([
tf.keras.layers.Conv1D(filters=64, kernel_size=3, activation='relu', padding="same",
input_shape=input_shape), [....]])
The equivalent of this:
import tensorflow as tf
tf.random.set_seed(345)
samples = 30
df_with_inputs = tf.random.normal((samples, 2), dtype=tf.float32)
df_with_labels = tf.random.uniform((samples, 1), maxval=2, dtype=tf.int32)
batch_size = 2
window_size = 20
ds1 = tf.keras.preprocessing.timeseries_dataset_from_array(df_with_inputs, df_with_labels, sequence_length=window_size,sequence_stride=1,shuffle=False, batch_size=batch_size)
for x, y in ds1.take(1):
print(x, y)
tf.Tensor(
[[[-0.01898661 1.2348452 ]
[-0.33379436 -0.13637085]
[-2.239644 1.5407541 ]
[-0.14988706 0.50577176]
[-1.6328571 -0.9512018 ]
[-3.0481005 0.8019097 ]
[-0.683125 -0.12166552]
[-0.5408724 -0.97584397]
[ 0.47595206 1.0512688 ]
[ 0.15297593 0.7393363 ]
[-0.17052855 -0.12541457]
[ 1.1617764 -2.491248 ]
[-2.5665069 0.9241422 ]
[ 0.40681016 -1.031384 ]
[-0.23945935 1.5275828 ]
[-1.3431666 0.2940185 ]
[ 1.7351524 0.34276873]
[ 0.8059861 2.0647929 ]
[-0.3017126 0.729208 ]
[-0.8672192 -0.79938954]]
[[-0.33379436 -0.13637085]
[-2.239644 1.5407541 ]
[-0.14988706 0.50577176]
[-1.6328571 -0.9512018 ]
[-3.0481005 0.8019097 ]
[-0.683125 -0.12166552]
[-0.5408724 -0.97584397]
[ 0.47595206 1.0512688 ]
[ 0.15297593 0.7393363 ]
[-0.17052855 -0.12541457]
[ 1.1617764 -2.491248 ]
[-2.5665069 0.9241422 ]
[ 0.40681016 -1.031384 ]
[-0.23945935 1.5275828 ]
[-1.3431666 0.2940185 ]
[ 1.7351524 0.34276873]
[ 0.8059861 2.0647929 ]
[-0.3017126 0.729208 ]
[-0.8672192 -0.79938954]
[-0.14423785 0.95039433]]], shape=(2, 20, 2), dtype=float32) tf.Tensor(
[[1]
[1]], shape=(2, 1), dtype=int32)
Using tf.data.Dataset.from_tensor_slices would be this:
ds2 = tf.data.Dataset.from_tensor_slices((df_with_inputs, df_with_labels)).batch(batch_size)
inputs_only_ds = ds2.map(lambda x, y: x)
inputs_only_ds = inputs_only_ds.flat_map(tf.data.Dataset.from_tensor_slices).window(window_size, shift=1, stride=1, drop_remainder=True).flat_map(lambda x: x.batch(window_size)).batch(batch_size)
ds2 = tf.data.Dataset.zip((inputs_only_ds, ds2.map(lambda x, y: y)))
for x, y in ds2.take(1):
print(x, y)
tf.Tensor(
[[[-0.01898661 1.2348452 ]
[-0.33379436 -0.13637085]
[-2.239644 1.5407541 ]
[-0.14988706 0.50577176]
[-1.6328571 -0.9512018 ]
[-3.0481005 0.8019097 ]
[-0.683125 -0.12166552]
[-0.5408724 -0.97584397]
[ 0.47595206 1.0512688 ]
[ 0.15297593 0.7393363 ]
[-0.17052855 -0.12541457]
[ 1.1617764 -2.491248 ]
[-2.5665069 0.9241422 ]
[ 0.40681016 -1.031384 ]
[-0.23945935 1.5275828 ]
[-1.3431666 0.2940185 ]
[ 1.7351524 0.34276873]
[ 0.8059861 2.0647929 ]
[-0.3017126 0.729208 ]
[-0.8672192 -0.79938954]]
[[-0.33379436 -0.13637085]
[-2.239644 1.5407541 ]
[-0.14988706 0.50577176]
[-1.6328571 -0.9512018 ]
[-3.0481005 0.8019097 ]
[-0.683125 -0.12166552]
[-0.5408724 -0.97584397]
[ 0.47595206 1.0512688 ]
[ 0.15297593 0.7393363 ]
[-0.17052855 -0.12541457]
[ 1.1617764 -2.491248 ]
[-2.5665069 0.9241422 ]
[ 0.40681016 -1.031384 ]
[-0.23945935 1.5275828 ]
[-1.3431666 0.2940185 ]
[ 1.7351524 0.34276873]
[ 0.8059861 2.0647929 ]
[-0.3017126 0.729208 ]
[-0.8672192 -0.79938954]
[-0.14423785 0.95039433]]], shape=(2, 20, 2), dtype=float32) tf.Tensor(
[[1]
[1]], shape=(2, 1), dtype=int32)
Note that flap_map is necessary to flatten the tensor in order to apply sliding windows more easily. The function flat_map(lambda x: x.batch(window_size)) simply creates batches of the flattened tensor after applying sliding windows.
With the line inputs_only_ds = ds2.map(lambda x, y: x) I extract only the data (x) without the labels (y) to run sliding windows. Afterwards, in tf.data.Dataset.zip((inputs_only_ds, ds2.map(lambda x, y: y))), I concatenate / zip the dataset with the sliding windows and the labels (y) resulting in the final result ds2.

How to understand the conv2d_transpose in tensorflow

The following is a test for conv2d_transpose.
import tensorflow as tf
import numpy as np
x = tf.constant(np.array([[
[[-67], [-77]],
[[-117], [-127]]
]]), tf.float32)
# shape = (3, 3, 1, 1) -> (height, width, input_channels, output_channels) - 3x3x1 filter
f = tf.constant(np.array([
[[[-1]], [[2]], [[-3]]],
[[[4]], [[-5]], [[6]]],
[[[-7]], [[8]], [[-9]]]
]), tf.float32)
conv = tf.nn.conv2d_transpose(x, f, output_shape=(1, 5, 5, 1), strides=[1, 2, 2, 1], padding='VALID')
The result:
tf.Tensor(
[[[[ 67.]
[ -134.]
[ 278.]
[ -154.]
[ 231.]]
[[ -268.]
[ 335.]
[ -710.]
[ 385.]
[ -462.]]
[[ 586.]
[ -770.]
[ 1620.]
[ -870.]
[ 1074.]]
[[ -468.]
[ 585.]
[-1210.]
[ 635.]
[ -762.]]
[[ 819.]
[ -936.]
[ 1942.]
[-1016.]
[ 1143.]]]], shape=(1, 5, 5, 1), dtype=float32)
To my understanding, it should work as described in Figure 4.5 in the doc
Therefore, the first element (conv[0,0,0,0]) should be -67*-9=603. Why it turns out to be 67?
The result may be expained by the following image:. But why the convolution kernel is inversed?
To explain best, I have made a draw.io figure to explain the results that you obtained.
I guess above illustration might help explain the reason why the first element of transpose conv. feature map is 67.
A key thing to note:
Unlike traditional convolution, in transpose convolution each element of the filter is multiplied by an element of the input feature map and the results of those individual multiplications & intermediate feature maps are overlaid on one another to create the final feature map. The stride determines how far apart the overlays are. In our case, stride = 2, hence the filter moves by 2 in both x & y dimension after each convolution with the original downsampled feature map.

Get column-wise maximums from a NumPy array

I have a 2D array, say
x = np.random.rand(10, 3)
array([[ 0.51158246, 0.51214272, 0.1107923 ],
[ 0.5210391 , 0.85308284, 0.63227215],
[ 0.57239625, 0.06276943, 0.1069803 ],
[ 0.71627613, 0.66454443, 0.56771438],
[ 0.24595493, 0.01007568, 0.84959605],
[ 0.99158904, 0.25034553, 0.00144037],
[ 0.43292656, 0.9247424 , 0.5123086 ],
[ 0.07224077, 0.57230282, 0.88522979],
[ 0.55665913, 0.20119776, 0.58865823],
[ 0.55129624, 0.26226446, 0.63070611]])
Then I find the indexes of maximum elements along the columns:
indexes = np.argmax(x, axis=0)
array([5, 6, 7])
So far so good.
But how do I actually get those elements? That is, how do I get ?some_operation?(x, indexes) == [0.99158904, 0.9247424, 0.88522979]?
Note that I need both the indexes and the associated values.
The best I could come up with was x[indexes, range(x.shape[1])], but it looks kinda complicated and inefficient. Is there a more idiomatic way?
You can use np.amax to find max value along an axis.
Using your example (x is the original array in your post):
In[1]: np.argmax(x, axis=0)
Out[1]:
array([5, 6, 7], dtype=int64)
In[2]: np.amax(x, axis=0)
Out[2]:
array([ 0.99158904, 0.9247424 , 0.88522979])
Documentation link

one-hot encoding and existing data

I have a numpy array (N,M) where some of the columns should be one-hot encoded. Please help to make a one-hot encoding using numpy and/or tensorflow.
Example:
[
[ 0.993, 0, 0.88 ]
[ 0.234, 1, 1.00 ]
[ 0.235, 2, 1.01 ]
.....
]
The 2nd column here ( with values 3 and 2 ) should be one hot-encoded, I know that there are only 3 distinct values ( 0, 1, 2 ).
The resulting array should look like:
[
[ 0.993, 0.88, 0, 0, 0 ]
[ 0.234, 1.00, 0, 1, 0 ]
[ 0.235, 1.01, 1, 0, 0 ]
.....
]
Like that I would be able to feed this array into the tensorflow.
Please notice that 2nd column was removed and it's one-hot version was appended in the end of each sub-array.
Any help would be highly appreciated.
Thanks in advance.
Update:
Here is what I have right now:
Well, not exactly...
1. I have more than 3 columns in the array...but I still want to do it only with 2nd..
2. First array is structured, ie it's shape is (N,)
Here is what I have:
def one_hot(value, max_value):
value = int(value)
a = np.zeros(max_value, 'uint8')
if value != 0:
a[value] = 1
return a
# data is structured array with the shape of (N,)
# it has strings, ints, floats inside..
# was get by np.genfromtxt(dtype=None)
unique_values = dict()
unique_values['categorical1'] = 1
unique_values['categorical2'] = 2
for row in data:
row[col] = unique_values[row[col]]
codes = np.zeros((data.shape[0], len(unique_values)))
idx = 0
for row in data:
codes[idx] = one_hot(row[col], len(unique_values)) # could be optimised by not creating new array every time
idx += 1
data = np.c_[data[:, [range(0, col), range(col + 1, 32)]], codes[data[:, col].astype(int)]]
Also trying to concatenate via:
print data.shape # shape (5000,)
print codes.shape # shape (5000,3)
data = np.concatenate((data, codes), axis=1)
Here's one approach -
In [384]: a # input array
Out[384]:
array([[ 0.993, 0. , 0.88 ],
[ 0.234, 1. , 1. ],
[ 0.235, 2. , 1.01 ]])
In [385]: codes = np.array([[0,0,0],[0,1,0],[1,0,0]]) # define codes here
In [387]: codes
Out[387]:
array([[0, 0, 0], # encoding for 0
[0, 1, 0], # encoding for 1
[1, 0, 0]]) # encoding for 2
# Slice out the second column and append one-hot encoded array
In [386]: np.c_[a[:,[0,2]], codes[a[:,1].astype(int)]]
Out[386]:
array([[ 0.993, 0.88 , 0. , 0. , 0. ],
[ 0.234, 1. , 0. , 1. , 0. ],
[ 0.235, 1.01 , 1. , 0. , 0. ]])

transposing data in array using numpy

I have list as following and need to be tranposed to a numpy array
samplelist= [ [ ['Name-1','Name-2','Name-3'] , ['Age-1','Age-2','Age-3'] ],
[ ['new_Name_1','new_Name_2','new_Name_3'], ['new_Age_1','new_Age_2','new_Age_3'] ]
]
Expected Result:
samplearray = [ [ ['Name-1','Age-1'], ['Name-2','Age-2'], ['Name-3','Age-3'] ],
[ ['new_Name_1','new_Age_1], ['new_Name_2','new_Age_2'], ['new_Name_3','new_Age_3'] ]
]
np.transpose results:
np.transpose(a)
array([[['Name-1', 'new_Name_1'],
['Age-1', 'new_Age_1']],
[['Name-2', 'new_Name_2'],
['Age-2', 'new_Age_2']],
[['Name-3', 'new_Name_3'],
['Age-3', 'new_Age_3']]],
dtype='|S10')
samplelist is a 3-D array.
In [58]: samplelist.shape
Out[58]: (2, 2, 3)
Using transpose swaps the first and last axes (0 and 2):
In [55]: samplelist.T
Out[55]:
array([[['Name-1', 'new_Name_1'],
['Age-1', 'new_Age_1']],
[['Name-2', 'new_Name_2'],
['Age-2', 'new_Age_2']],
[['Name-3', 'new_Name_3'],
['Age-3', 'new_Age_3']]],
dtype='|S10')
In [57]: samplelist.swapaxes(0,2)
Out[57]:
array([[['Name-1', 'new_Name_1'],
['Age-1', 'new_Age_1']],
[['Name-2', 'new_Name_2'],
['Age-2', 'new_Age_2']],
[['Name-3', 'new_Name_3'],
['Age-3', 'new_Age_3']]],
dtype='|S10')
To get the desired array, swap axes 1 and 2:
import numpy as np
samplelist = np.array([
[ ['Name-1','Name-2','Name-3'] , ['Age-1','Age-2','Age-3'] ],
[ ['new_Name_1','new_Name_2','new_Name_3'], ['new_Age_1','new_Age_2','new_Age_3'] ]
])
print(samplelist.swapaxes(1,2))
# [[['Name-1' 'Age-1']
# ['Name-2' 'Age-2']
# ['Name-3' 'Age-3']]
# [['new_Name_1' 'new_Age_1']
# ['new_Name_2' 'new_Age_2']
# ['new_Name_3' 'new_Age_3']]]