Custom word2vec Transformer on pandas dataframe and using it in FeatureUnion - pandas

For the below pandas DataFrame df, I want to transform the type column to OneHotEncoding, and transform the word column to its vector representation using the dictionary word2vec. Then I want to concatenate the two transformed vectors with the count column to form the final feature for classification.
>>> df
word type count
0 apple A 4
1 cat B 3
2 mountain C 1
>>> df.dtypes
word object
type category
count int64
>>> word2vec
{'apple': [0.1, -0.2, 0.3], 'cat': [0.2, 0.2, 0.3], 'mountain': [0.4, -0.2, 0.3]}
I defined customized Transformer, and use FeatureUnion to concatenate the features.
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.preprocessing import OneHotEncoder
class w2vTransformer(TransformerMixin):
def __init__(self,word2vec):
self.word2vec = word2vec
def fit(self,x, y=None):
return self
def wv(self, w):
return self.word2vec[w] if w in self.word2vec else [0, 0, 0]
def transform(self, X, y=None):
return df['word'].apply(self.wv)
pipeline = Pipeline([
('features', FeatureUnion(transformer_list=[
# Part 1: get integer column
('numericals', Pipeline([
('selector', TypeSelector(np.number)),
])),
# Part 2: get category column and its onehotencoding
('categoricals', Pipeline([
('selector', TypeSelector('category')),
('labeler', StringIndexer()),
('encoder', OneHotEncoder(handle_unknown='ignore')),
])),
# Part 3: transform word to its embedding
('word2vec', Pipeline([
('w2v', w2vTransformer(word2vec)),
]))
])),
])
When I run pipeline.fit_transform(df), I got the error: blocks[0,:] has incompatible row dimensions. Got blocks[0,2].shape[0] == 1, expected 3.
However, if I removed the word2vec Transformer (Part 3) from the pipeline, the pipeline (Part1 1 + Part 2) works fine.
>>> pipeline_no_word2vec.fit_transform(df).todense()
matrix([[4., 1., 0., 0.],
[3., 0., 1., 0.],
[1., 0., 0., 1.]])
And if I keep only the w2v transformer in the pipeline, it also works.
>>> pipeline_only_word2vec.fit_transform(df)
array([list([0.1, -0.2, 0.3]), list([0.2, 0.2, 0.3]),
list([0.4, -0.2, 0.3])], dtype=object)
My guess is that there is something wrong in my w2vTransformer class but don't know how to fix it. Please help.

This error is due to the fact that the FeatureUnion expects a 2-d array from each of its parts.
Now the first two parts of your FeatureUnion:- 'numericals' and 'categoricals' are correctly sending 2-d data of shape (n_samples, n_features).
n_samples = 3 in your example data. n_features will depend on individual parts (like OneHotEncoder will change them in 2nd part, but will be 1 in first part).
But the third part 'word2vec' returns a pandas.Series object which have the 1-d shape (3,). FeatureUnion takes this a shape (1, 3) by default and hence the complains that it does not match other blocks.
So you need to correct that shape.
Now even if you simply do a reshape() at the end and change it to shape (3,1), your code will not run, because the internal contents of that array are lists from your word2vec dict, which are not transformed correctly to a 2-d array. Instead it will become a array of lists.
Change the w2vTransformer to correct the error:
class w2vTransformer(TransformerMixin):
...
...
def transform(self, X, y=None):
return np.array([np.array(vv) for vv in X['word'].apply(self.wv)])
And after that the pipeline will work.

Related

batch axis in keras custom layer

I want to make a custom layer that does the following, given a batch of input vectors.
For each vector a in the batch:
get the first element a[0].
multiply the vector a by a[0] elementwise.
So if the batch is
[[ 1., 2., 3.],
[ 4., 5., 6.],
[ 7., 8., 9.],
[10., 11., 12.]]
This should be a batch of 4 vectors, each with dimension 3 (or am I wrong here?).
Then my layer should transform the batch to the following:
[[ 1., 2., 3.],
[ 16., 20., 24.],
[ 49., 56., 63.],
[100., 110., 120.]]
Here is my implementation for the layer:
class MyLayer(keras.layers.Layer):
def __init__(self, activation=None, **kwargs):
super().__init__(**kwargs)
self.activation = keras.activations.get(activation)
def call(self, a):
scale = a[0]
return self.activation(a * scale)
def get_config(self):
base_config = super().get_config()
return {**base_config,
"activation": keras.activations.serialize(self.activation)}
But the output is different from what I expected:
batch = tf.Variable([[1,2,3],
[4,5,6],
[7,8,9],
[10,11,12]], dtype=tf.float32)
layer = MyLayer()
print(layer(batch))
Output:
tf.Tensor(
[[ 1. 4. 9.]
[ 4. 10. 18.]
[ 7. 16. 27.]
[10. 22. 36.]], shape=(4, 3), dtype=float32)
It looks like the implementation actually treats each column as a vector, which is strange to me because other pre-written models, such as the sequential model, specify the input shape to be (batch_size, ...), which means each row, instead of column, is a vector.
How should I modify my code so that it behaves the way I want?
Actually, your input shape is (4,3). So when you slice this tensor by a[0] it gets the first row which is [1,2,3]. To get what you want you should instead get the first column and then transpose your matrix to give you the desired matrix like this:
def call(self, a):
scale = a[:,0]
return tf.transpose(self.activation(tf.transpose(a) * scale))

How to interpret get_weights for Keras GRU?

I am unable to interpret the results of get_weights from a GRU layer. Here's my code -
#Modified from - https://machinelearningmastery.com/understanding-simple-recurrent-neural-networks-in-keras/
from pandas import read_csv
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, SimpleRNN, GRU
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
import math
import matplotlib.pyplot as plt
model = Sequential()
model.add(GRU(units = 2, input_shape = (3,1), activation = 'linear'))
model.add(Dense(units = 1, activation = 'linear'))
model.compile(loss = 'mean_squared_error', optimizer = 'adam')
initial_weights = model.layers[0].get_weights()
print("Shape = ",initial_weights)
I am familiar with GRU concepts. In addition, I understand how the get_weights work for Keras Simple RNN layer, where the first array represents the input weights, the second the activation weights and the third the bias. However, I am lost with output of GRU, which is given below -
Shape = [array([[-0.64266175, -0.0870676 , -0.25356603, -0.03685969, 0.22260845,
-0.04923642]], dtype=float32), array([[ 0.01929092, -0.4932567 , 0.3723044 , -0.6559699 , -0.33790302,
0.27062896],
[-0.4214194 , 0.46456426, 0.27233726, -0.00461334, -0.6533575 ,
-0.32483965]], dtype=float32), array([[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]], dtype=float32)]
I am assuming it has something to do with GRU gates.
Update:7/4 - This page says that keras GRU has 3 gates, update, reset and output. However, based on this, GRU shouldn't have the output gate.
Best way I know would be to track the add_weight() calls in the build() function of the GRUCell.
Let's take an example model,
model = tf.keras.models.Sequential(
[
tf.keras.layers.GRU(32, input_shape=(5, 10), name='gru'),
tf.keras.layers.Dense(10)
]
)
How we'll print some metadata about what's returned by weights = model.get_layer('gru').get_weights(). Which gives,
Number of arrays in weights: 3
Shape of each array in weights: [(10, 96), (32, 96), (2, 96)]
Let's go back to what weights defined by the GRUCell. We got,
self.kernel = self.add_weight(
shape=(input_dim, self.units * 3),
...
)
self.recurrent_kernel = self.add_weight(
shape=(self.units, self.units * 3),
...
)
...
bias_shape = (2, 3 * self.units)
self.bias = self.add_weight(
shape=bias_shape,
...
)
This is what you're seeing as weights (in that order). Here's why they are shaped like this. GRU computations are outlined here.
The first matrix in weights (of shape [10, 96]) is a concatenation of Wz|Wr|Wh (in that order). Each of these is a [10, 32] sized tensor. Concatenation gives a [10, 32*3=96] sized tensor.
Similarly, the second matrix is a concatenation of Uz|Ur|Uh. Each of these is a [32, 32] sized tensor which becomes [32, 96] after concatenation.
You can see how they break this combined weight matrix to each of z, r and h components here.
Finally the bias. It contains 2 biases i.e. [2, 96] sized tensor; input_bias and recurrent_bias. Again, biases from all gates/weights are combined to a single tensor. Typically, only the input_bias is used. But if you have reset_after (decides how the reset gate is applied) set to True, then the recurrent_bias gets used. It's an implementation detail.

From softmax output to class prediction

Is there an easy way to go from a Softmax output to a class prediction?
For instance,
from this:
[0.83128697, 0.06161868, 0.10709436]
to this:
[1, 0, 0]
You can use np.argmax to retrieve the index of max value:
import numpy as np
a = [0.83128697, 0.06161868, 0.10709436]
r = np.zeros(len(a)) # a.size if a is a numpy array
r[np.argmax(a)]=1
r
array([1., 0., 0.])

Select weight of action from a tensorflow model

I have a small model used in a reinforcement learning context.
I can input a 2d tensor of states, and I get a 2d tensor of action weigths.
Let say I input two states and I get the following action weights out:
[[0.1, 0.2],
[0.3, 0.4]]
Now I have another 2d tensor which have the action number from which I want to get the weights:
[[1],
[0]]
How can I use this tensor to get the weight of actions?
In this example I'd like to get:
[[0.2],
[0.3]]
Similar to Tensorflow tf.gather with axis parameter, the indices are handled little different here:
a = tf.constant( [[0.1, 0.2], [0.3, 0.4]])
indices = tf.constant([[1],[0]])
# convert to full indices
full_indices = tf.stack([tf.range(indices.shape[0])[...,tf.newaxis], indices], axis=2)
# gather
result = tf.gather_nd(a,full_indices)
with tf.Session() as sess:
print(sess.run(result))
#[[0.2]
#[0.3]]
A simple way to do this is squeeze the dimensions of indices, element-wise multiply with corresponding one-hot vector and then expand the dimensions later.
import tensorflow as tf
weights = tf.constant([[0.1, 0.2], [0.3, 0.4]])
indices = tf.constant([[1], [0]])
# Reduce from 2d (2, 1) to 1d (2,)
indices1d = tf.squeeze(indices)
# One-hot vector corresponding to the indices. shape (2, 2)
action_one_hot = tf.one_hot(indices=indices1d, depth=weights.shape[1])
# Element-wise multiplication and sum across axis 1 to pick the weight. Shape (2,)
action_taken_weight = tf.reduce_sum(action_one_hot * weights, axis=1)
# Expand the dimension back to have a 2d. Shape (2, 1)
action_taken_weight2d = tf.expand_dims(action_taken_weight, axis=1)
sess = tf.InteractiveSession()
print("weights\n", sess.run(weights))
print("indices\n", sess.run(indices))
print("indices1d\n", sess.run(indices1d))
print("action_one_hot\n", sess.run(action_one_hot))
print("action_taken_weight\n", sess.run(action_taken_weight))
print("action_taken_weight2d\n", sess.run(action_taken_weight2d))
Should give you the following output:
weights
[[0.1 0.2]
[0.3 0.4]]
indices
[[1]
[0]]
indices1d
[1 0]
action_one_hot
[[0. 1.]
[1. 0.]]
action_taken_weight
[0.2 0.3]
action_taken_weight2d
[[0.2]
[0.3]]
Note: You can also do action_taken_weight = tf.reshape(action_taken_weight, tf.shape(indices)) instead of expand_dims.

numpy divide along axis

Is there a numpy function to divide an array along an axis with elements from another array? For example, suppose I have an array a with shape (l,m,n) and an array b with shape (m,); I'm looking for something equivalent to:
def divide_along_axis(a,b,axis=None):
if axis is None:
return a/b
c = a.copy()
for i, x in enumerate(c.swapaxes(0,axis)):
x /= b[i]
return c
For example, this is useful when normalizing an array of vectors:
>>> a = np.random.randn(4,3)
array([[ 1.03116167, -0.60862215, -0.29191449],
[-1.27040355, 1.9943905 , 1.13515384],
[-0.47916874, 0.05495749, -0.58450632],
[ 2.08792161, -1.35591814, -0.9900364 ]])
>>> np.apply_along_axis(np.linalg.norm,1,a)
array([ 1.23244853, 2.62299312, 0.75780647, 2.67919815])
>>> c = divide_along_axis(a,np.apply_along_axis(np.linalg.norm,1,a),0)
>>> np.apply_along_axis(np.linalg.norm,1,c)
array([ 1., 1., 1., 1.])
For the specific example you've given: dividing an (l,m,n) array by (m,) you can use np.newaxis:
a = np.arange(1,61, dtype=float).reshape((3,4,5)) # Create a 3d array
a.shape # (3,4,5)
b = np.array([1.0, 2.0, 3.0, 4.0]) # Create a 1-d array
b.shape # (4,)
a / b # Gives a ValueError
a / b[:, np.newaxis] # The result you want
You can read all about the broadcasting rules here. You can also use newaxis more than once if required. (e.g. to divide a shape (3,4,5,6) array by a shape (3,5) array).
From my understanding of the docs, using newaxis + broadcasting avoids also any unecessary array copying.
Indexing, newaxis etc are described more fully here now. (Documentation reorganised since this answer first posted).
I think you can get this behavior with numpy's usual broadcasting behavior:
In [9]: a = np.array([[1., 2.], [3., 4.]])
In [10]: a / np.sum(a, axis=0)
Out[10]:
array([[ 0.25 , 0.33333333],
[ 0.75 , 0.66666667]])
If i've interpreted correctly.
If you want the other axis you could transpose everything:
> a = np.random.randn(4,3).transpose()
> norms = np.apply_along_axis(np.linalg.norm,0,a)
> c = a / norms
> np.apply_along_axis(np.linalg.norm,0,c)
array([ 1., 1., 1., 1.])