Related
I am unable to interpret the results of get_weights from a GRU layer. Here's my code -
#Modified from - https://machinelearningmastery.com/understanding-simple-recurrent-neural-networks-in-keras/
from pandas import read_csv
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, SimpleRNN, GRU
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
import math
import matplotlib.pyplot as plt
model = Sequential()
model.add(GRU(units = 2, input_shape = (3,1), activation = 'linear'))
model.add(Dense(units = 1, activation = 'linear'))
model.compile(loss = 'mean_squared_error', optimizer = 'adam')
initial_weights = model.layers[0].get_weights()
print("Shape = ",initial_weights)
I am familiar with GRU concepts. In addition, I understand how the get_weights work for Keras Simple RNN layer, where the first array represents the input weights, the second the activation weights and the third the bias. However, I am lost with output of GRU, which is given below -
Shape = [array([[-0.64266175, -0.0870676 , -0.25356603, -0.03685969, 0.22260845,
-0.04923642]], dtype=float32), array([[ 0.01929092, -0.4932567 , 0.3723044 , -0.6559699 , -0.33790302,
0.27062896],
[-0.4214194 , 0.46456426, 0.27233726, -0.00461334, -0.6533575 ,
-0.32483965]], dtype=float32), array([[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]], dtype=float32)]
I am assuming it has something to do with GRU gates.
Update:7/4 - This page says that keras GRU has 3 gates, update, reset and output. However, based on this, GRU shouldn't have the output gate.
Best way I know would be to track the add_weight() calls in the build() function of the GRUCell.
Let's take an example model,
model = tf.keras.models.Sequential(
[
tf.keras.layers.GRU(32, input_shape=(5, 10), name='gru'),
tf.keras.layers.Dense(10)
]
)
How we'll print some metadata about what's returned by weights = model.get_layer('gru').get_weights(). Which gives,
Number of arrays in weights: 3
Shape of each array in weights: [(10, 96), (32, 96), (2, 96)]
Let's go back to what weights defined by the GRUCell. We got,
self.kernel = self.add_weight(
shape=(input_dim, self.units * 3),
...
)
self.recurrent_kernel = self.add_weight(
shape=(self.units, self.units * 3),
...
)
...
bias_shape = (2, 3 * self.units)
self.bias = self.add_weight(
shape=bias_shape,
...
)
This is what you're seeing as weights (in that order). Here's why they are shaped like this. GRU computations are outlined here.
The first matrix in weights (of shape [10, 96]) is a concatenation of Wz|Wr|Wh (in that order). Each of these is a [10, 32] sized tensor. Concatenation gives a [10, 32*3=96] sized tensor.
Similarly, the second matrix is a concatenation of Uz|Ur|Uh. Each of these is a [32, 32] sized tensor which becomes [32, 96] after concatenation.
You can see how they break this combined weight matrix to each of z, r and h components here.
Finally the bias. It contains 2 biases i.e. [2, 96] sized tensor; input_bias and recurrent_bias. Again, biases from all gates/weights are combined to a single tensor. Typically, only the input_bias is used. But if you have reset_after (decides how the reset gate is applied) set to True, then the recurrent_bias gets used. It's an implementation detail.
import pandas as pd
import numpy as np
df = pd.read_csv('shops.csv', sep='|')
df.columns = ['name', # 상호명
'cate_1', # 중분류명
'cate_2', # 소분류명
'cate_3', # 표준산업분류명
'dong', # 행정동명
'lon', # 위도
'lat' # 경도
]
df['cate_mix'] = df['cate_1'] + df['cate_2'] + df['cate_3']
df['cate_mix'] = df['cate_mix'].str.replace("/", " ")
from sklearn.feature_extraction.text import CountVectorizer # 피체 벡터화
from sklearn.metrics.pairwise import cosine_similarity # 코사인 유사도
count_vect_category = CountVectorizer(min_df=0, ngram_range=(1,2))
place_category = count_vect_category.fit_transform(df['cate_mix'])
place_simi_cate = cosine_similarity(place_category, place_category)
place_simi_cate_sorted_ind = place_simi_cate.argsort()[:, ::-1]
At this time, I want to calculate the cosine similarity as above,
via tensorflow
Is there any way to calculate it?
Example:
y_true = [[0., 1.], [1., 1.]]
y_pred = [[1., 0.], [1., 1.]]
cosine_loss = tf.keras.losses.CosineSimilarity(axis=1)
cosine_loss(y_true, y_pred).numpy()
Source: TensorFlow docs
For the below pandas DataFrame df, I want to transform the type column to OneHotEncoding, and transform the word column to its vector representation using the dictionary word2vec. Then I want to concatenate the two transformed vectors with the count column to form the final feature for classification.
>>> df
word type count
0 apple A 4
1 cat B 3
2 mountain C 1
>>> df.dtypes
word object
type category
count int64
>>> word2vec
{'apple': [0.1, -0.2, 0.3], 'cat': [0.2, 0.2, 0.3], 'mountain': [0.4, -0.2, 0.3]}
I defined customized Transformer, and use FeatureUnion to concatenate the features.
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.preprocessing import OneHotEncoder
class w2vTransformer(TransformerMixin):
def __init__(self,word2vec):
self.word2vec = word2vec
def fit(self,x, y=None):
return self
def wv(self, w):
return self.word2vec[w] if w in self.word2vec else [0, 0, 0]
def transform(self, X, y=None):
return df['word'].apply(self.wv)
pipeline = Pipeline([
('features', FeatureUnion(transformer_list=[
# Part 1: get integer column
('numericals', Pipeline([
('selector', TypeSelector(np.number)),
])),
# Part 2: get category column and its onehotencoding
('categoricals', Pipeline([
('selector', TypeSelector('category')),
('labeler', StringIndexer()),
('encoder', OneHotEncoder(handle_unknown='ignore')),
])),
# Part 3: transform word to its embedding
('word2vec', Pipeline([
('w2v', w2vTransformer(word2vec)),
]))
])),
])
When I run pipeline.fit_transform(df), I got the error: blocks[0,:] has incompatible row dimensions. Got blocks[0,2].shape[0] == 1, expected 3.
However, if I removed the word2vec Transformer (Part 3) from the pipeline, the pipeline (Part1 1 + Part 2) works fine.
>>> pipeline_no_word2vec.fit_transform(df).todense()
matrix([[4., 1., 0., 0.],
[3., 0., 1., 0.],
[1., 0., 0., 1.]])
And if I keep only the w2v transformer in the pipeline, it also works.
>>> pipeline_only_word2vec.fit_transform(df)
array([list([0.1, -0.2, 0.3]), list([0.2, 0.2, 0.3]),
list([0.4, -0.2, 0.3])], dtype=object)
My guess is that there is something wrong in my w2vTransformer class but don't know how to fix it. Please help.
This error is due to the fact that the FeatureUnion expects a 2-d array from each of its parts.
Now the first two parts of your FeatureUnion:- 'numericals' and 'categoricals' are correctly sending 2-d data of shape (n_samples, n_features).
n_samples = 3 in your example data. n_features will depend on individual parts (like OneHotEncoder will change them in 2nd part, but will be 1 in first part).
But the third part 'word2vec' returns a pandas.Series object which have the 1-d shape (3,). FeatureUnion takes this a shape (1, 3) by default and hence the complains that it does not match other blocks.
So you need to correct that shape.
Now even if you simply do a reshape() at the end and change it to shape (3,1), your code will not run, because the internal contents of that array are lists from your word2vec dict, which are not transformed correctly to a 2-d array. Instead it will become a array of lists.
Change the w2vTransformer to correct the error:
class w2vTransformer(TransformerMixin):
...
...
def transform(self, X, y=None):
return np.array([np.array(vv) for vv in X['word'].apply(self.wv)])
And after that the pipeline will work.
I have a tensor say:
y_true = np.array([[[1.], [0.], [3.]], [[5.], [0.], [0.]]])
I want to iterate over y_true accessing all indevidual values. I want to do something like following in java:
for(i=0;i<y_true.length;i++){
arr2 = y_true[i];
for(j=0;j<arr2.length;j++){
print(arr2[j][0])
}
}.
Are you looking for slicing with [:,:,0]?
>>> y_true[:,:,0]
array([[1., 0., 3.],
[5., 0., 0.]])
There are 2 cases:
You know the rank(dimensionality) of your created numpy array in the example y_true array has rank of 3, and you can check y_true.shape property which should give you with exact size of each dimension of the y_true, then you can write as many for loops the rank of y_true and output each element separately, for example:
import numpy as np
y_true = np.array([[[1.], [0.], [3.]], [[5.], [0.], [0.]]])
dims = y_true.shape
for i in range(dims[0]):
for j in range(dims[1]):
for k in range(dims[2]):
print("Element of np array with indices {} is equal to {}".format([i, j, k], y_true[i, j, k]))
If you don't know the rank of the tensor you want to print then you can write recursive function that will print all the elements, for example:
import numpy as np
def recursively_print_elems(np_arr, idx, pos):
if pos >= len(np_arr.shape):
print("Element of np array with indeces {} is equal to: {}".format(idx, np_arr[tuple(idx)]))
return
for i in range(np_arr.shape[pos]):
idx[pos] = i
recursively_print_elems(np_arr, idx, pos + 1)
def print_elems(np_arr):
idx = [0] * len(np_arr.shape)
recursively_print_elems(np_arr, idx, 0)
y_true = np.array([[[1.], [0.], [3.]], [[5.], [0.], [0.]]])
print_elems(y_true)
The 2nd approach is more general it will work for any dimensional tensor.
Your array:
In [19]: y_true
Out[19]:
array([[[1.],
[0.],
[3.]],
[[5.],
[0.],
[0.]]])
In [20]: y_true.shape
Out[20]: (2, 3, 1)
With a last dimension of size 1, we can reshape it
In [21]: y_true.reshape(2,3)
Out[21]:
array([[1., 0., 3.],
[5., 0., 0.]])
Selecting on that index does just as well.
But you can access all values in order just by raveling/flattening:
In [22]: y_true.ravel()
Out[22]: array([1., 0., 3., 5., 0., 0.])
Or get a 1 iterator:
In [23]: yiter = y_true.flat
In [24]: yiter?
Type: flatiter
String form: <numpy.flatiter object at 0x1fdd200>
Length: 6
File: ~/.local/lib/python3.6/site-packages/numpy/__init__.py
Docstring: <no docstring>
Class docstring:
Flat iterator object to iterate over arrays.
A `flatiter` iterator is returned by ``x.flat`` for any array `x`.
It allows iterating over the array as if it were a 1-D array,
either in a for-loop or by calling its `next` method.
...
So instead of constructing an iterator for each dimension we can iterate on this flat one:
In [25]: for item in yiter:print(item)
1.0
0.0
3.0
5.0
0.0
0.0
ndenumerate uses this flat iterator, and returns both coordinates and values:
In [26]: list(np.ndenumerate(y_true))
Out[26]:
[((0, 0, 0), 1.0),
((0, 1, 0), 0.0),
((0, 2, 0), 3.0),
((1, 0, 0), 5.0),
((1, 1, 0), 0.0),
((1, 2, 0), 0.0)]
A variation on this is ndindex:
In [27]: indexs = np.ndindex(y_true.shape)
In [28]: for ijk in indexs:
...: print(ijk, y_true[ijk])
...:
(0, 0, 0) 1.0
(0, 1, 0) 0.0
(0, 2, 0) 3.0
(1, 0, 0) 5.0
(1, 1, 0) 0.0
(1, 2, 0) 0.0
But where possible it is better to operate on the whole array, rather than iterate. Those whole-array operations do the iteration in compiled code.
Is there a numpy function to divide an array along an axis with elements from another array? For example, suppose I have an array a with shape (l,m,n) and an array b with shape (m,); I'm looking for something equivalent to:
def divide_along_axis(a,b,axis=None):
if axis is None:
return a/b
c = a.copy()
for i, x in enumerate(c.swapaxes(0,axis)):
x /= b[i]
return c
For example, this is useful when normalizing an array of vectors:
>>> a = np.random.randn(4,3)
array([[ 1.03116167, -0.60862215, -0.29191449],
[-1.27040355, 1.9943905 , 1.13515384],
[-0.47916874, 0.05495749, -0.58450632],
[ 2.08792161, -1.35591814, -0.9900364 ]])
>>> np.apply_along_axis(np.linalg.norm,1,a)
array([ 1.23244853, 2.62299312, 0.75780647, 2.67919815])
>>> c = divide_along_axis(a,np.apply_along_axis(np.linalg.norm,1,a),0)
>>> np.apply_along_axis(np.linalg.norm,1,c)
array([ 1., 1., 1., 1.])
For the specific example you've given: dividing an (l,m,n) array by (m,) you can use np.newaxis:
a = np.arange(1,61, dtype=float).reshape((3,4,5)) # Create a 3d array
a.shape # (3,4,5)
b = np.array([1.0, 2.0, 3.0, 4.0]) # Create a 1-d array
b.shape # (4,)
a / b # Gives a ValueError
a / b[:, np.newaxis] # The result you want
You can read all about the broadcasting rules here. You can also use newaxis more than once if required. (e.g. to divide a shape (3,4,5,6) array by a shape (3,5) array).
From my understanding of the docs, using newaxis + broadcasting avoids also any unecessary array copying.
Indexing, newaxis etc are described more fully here now. (Documentation reorganised since this answer first posted).
I think you can get this behavior with numpy's usual broadcasting behavior:
In [9]: a = np.array([[1., 2.], [3., 4.]])
In [10]: a / np.sum(a, axis=0)
Out[10]:
array([[ 0.25 , 0.33333333],
[ 0.75 , 0.66666667]])
If i've interpreted correctly.
If you want the other axis you could transpose everything:
> a = np.random.randn(4,3).transpose()
> norms = np.apply_along_axis(np.linalg.norm,0,a)
> c = a / norms
> np.apply_along_axis(np.linalg.norm,0,c)
array([ 1., 1., 1., 1.])