Tries to understand Tensorflow input_shape - tensorflow

I have some confusions regarding to Tensorflow input_shape.
Suppose there are 3 documents (each row) in "doc" defined below, and the vocabulary has 4 words (each sublist in each row).
Further suppose that each word is represented by 2 numbers via word embedding.
The program only works when I specify input_shape=(3,4,2) under a Dense layer.
But when I use a LSTM layer, the program only works when input_shape=(4,2) but not when input_shape=(3,4,2).
So how to specify the input shape for such inputs? How to make sense of it?
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.metrics import categorical_crossentropy
doc=[
[[1,0],[0,0],[0,0],[0,0]],
[[0,0],[1,0],[0,0],[0,0]],
[[0,0],[0,0],[1,0],[0,0]]
]
model=Sequential()
model.add(Dense(2,input_shape=(3,4,2))) # model.add(LSTM(2,input_shape=(4,2)))
model.compile(optimizer=Adam(learning_rate=0.0001),loss="sparse_categorical_crossentropy",metrics=("accuracy"))
model.summary()
output=model.predict(doc)
print(model.weights)
print(output)

The input_shape argument in a keras.layers.LTSM layer expects a 2D array with a shape of [timesteps, features]. Your doc has the shape [batch_size, timesteps, features] and therefore one dimension too much.
You can use the batch_input_shape argument instead, if you want feed batch_size, too.
To do so, you have just to replace this line of your code:
model.add(LSTM(2,input_shape=(4,2)))
With this one:
model.add(LSTM(2,batch_input_shape=(3,4,2)))
If you're setting a specific batch_size in your model and then feed a different size other than 3 (in your case), you will get an error. Using input_shape instead you have the flexibility to feed any batch size to the network.

Related

Using Sparse Tensors as Input for Autoencoders

I have an One-hot-encoded sparse matrix which can't be transformed into a normal matrix due to its size.
I would like to reduce the dimensions using an autoencoder. Currently I am trying to use Tensorflow and its Keras library for that.
The Tensorflow docs state that sparse tensors exist and that they can be used in Keras (see https://www.tensorflow.org/guide/sparse_tensor).
The Problem is that all autoencoders I've found in the internet do not seem to work with sparse tensors.
I have prepared a small code example which stops after the first training epoch with the error message: "Failed to convert elements of SparseTensor to Tensor. Consider casting elements to a supported type.".
My Questions would be:
Do you have an idea to improve the Code or ideally do you have an example which I can look up?
If not: Do you have other ideas on how to do what I would like to do (e.g. another library, other method, etc.)?
Code Example:
#necessary imports
import tensorflow as tf
from keras.models import Model, Sequential
from keras.layers import Input, Dense, ActivityRegularization
from tensorflow.keras import backend as K
from tensorflow.keras import regularizers
#example one-hot-encoded matrix with 10 records with each one out of 4 distinct categories
sparse_tensor = tf.sparse.SparseTensor(indices=[[0,3], [1,3], [2,0], [3,1], [4,0], [5,2], [6,2], [7,1], [8,3], [9,1]],
values=[1 for i in range(10)],
dense_shape=[10, 4])
encoder = Sequential([
Input(shape=(4,), sparse=True),
Dense(1, activation = 'relu'),
ActivityRegularization(l1=1e-3)
])
decoder = Sequential([
Dense(4, activation = 'sigmoid', input_shape = (1, )),
])
autoencoder = Sequential([encoder, decoder])
autoencoder.compile(optimizer='adam', loss='binary_crossentropy')
autoencoder.fit(x=sparse_tensor, y=sparse_tensor, epochs=5, batch_size=5, shuffle=True)

Why Keras raises shape error in last dense layer?

I built a simple NN to distinguish integers from decimals, my input data is 1 dimensional array,and the final output should be the probability of integer.
At first, I succeeded when last layer(name:output) had 1 unit. But it raised ValueError when I changed the last dense layer to two units,for I wanted to output both probabilities of number x as integer and decimal.
from tensorflow.python.keras.models import Sequential,load_model
from tensorflow.python.keras.utils import np_utils
from tensorflow.python.keras.layers import Dense
from tensorflow.python.keras.layers import Activation
from tensorflow import keras
import numpy as np
import tensorflow as tf
from sklearn.utils import shuffle
def train():
t=[]
a=[]
for i in range (0,8000): #generate some training data
ran=np.random.randint(2)
if(ran==0):
y=np.random.uniform(-100,100)
t.append(y)
a.append(0)
else:
y=np.random.randint(1000)
t.append(y)
a.append(1)
t=np.asarray(t)
a=np.asarray(a)
pt=t.reshape(-1,1) #reshape for fit()
pa=a.reshape(-1,1)
pt,pa=shuffle(pt,pa)
model=Sequential()
dense=Dense(units=32,input_shape=(1,),activation='relu')
dense2=Dense(units=64,activation='relu')
output=Dense(units=2,activation='softmax') # HERE is the problem
model.add(dense)
model.add(dense2)
model.add(output)
model.summary()
model.compile(optimizer='adam',loss='binary_crossentropy',metrics=['accuracy'])
model.fit(pt,pa,validation_split=0.02,batch_size=10, epochs=50, verbose=2)
model.save('integer_predictor.h5')
train()
ValueError: Error when checking target: expected dense_2 to have shape (2,) but got array with shape (1,)
This should solve your problem
model.compile(optimizer='adam',loss='sparse_categorical_crossentropy',metrics=['accuracy'])
Since you have 2 outputs, you cant use binary cross_entropy since its a 2 class classification problem. Also, when your inputs are not one-hot encoded you will need sparse_categorical_crossentropy. If you have one hot features then categorical_crossentropy will work with outputs > 1.
Read this to get more insight into this.

tf.keras loss from two images in serial

I want to use the stability training approach of the paper and apply it to a very simple CNN.
The principle architecture is given by:
As shown in the figure you compute the loss based on the output f(I) for the input image I and on
the output f(I') for the perturbed image I'.
My question would be how to do this in a valid way without having two instances of the DNN,
as I'm training on large 3D images. In other words: how can I process two images in serial and compute the loss based on those two images?
I'm using tf2 with keras.
You can first write your DNN as a tf.keras Model.
After that, you can write another model which takes two image inputs, applies some Gaussian noise to one, passes them to DNN.
Design a custom loss function which finds the proper loss from the two outputs.
Here's a demo code:
from tensorflow.keras.layers import Input, Dense, Add, Activation, Flatten
from tensorflow.keras.models import Model
import tensorflow as tf
import numpy as np
import random
from tensorflow.python.keras.layers import Input, GaussianNoise, BatchNormalization
# shared DNN, this is the base model with a feature-space output, there is only once instance of the model
ip = Input(shape=(32,32,1)) # same as original inputs
f0 = Flatten()(ip)
d0 = Dense(10)(f0) # 10 dimensional feature embedding
dnn = Model(ip, d0)
# final model with two version of images and loss
input_1 = Input(shape=(32,32,1))
input_2 = Input(shape=(32,32,1))
g0 = GaussianNoise(0.5)(input_2) # only input_2 passes through gaussian noise layer, you can design your own custom layer too
# passing the two images to same DNN
path1 = dnn(input_1) # no noise
path2 = dnn(g0) # noise
model = Model([input_1, input_2], [path1, path2])
def my_loss(y_true, y_pred):
# calculate your loss based on your two outputs path1, path2
pass
model.compile('adam', my_loss)
model.summary()

Unable to track record by record processing in LSTM algorithm for text classification?

We are working on multi-class text classification and following is the process which we have used.
1) We have created 300 dim's vector with word2vec word embedding using our own data and then passed that vector as a weights to LSTM embedding layer.
2) And then we have used one LSTM layer and one dense layer.
Here below is my code:
input_layer = layers.Input((train_seq_x.shape[1], ))
embedding_layer = layers.Embedding(len(word_index)+1, 300, weights=[embedding_matrix], trainable=False)(input_layer)
embedding_layer = layers.SpatialDropout1D(0.3)(embedding_layer)
lstm_layer1 = layers.LSTM(300,return_sequences=True,activation="relu")(embedding_layer)
lstm_layer1 = layers.Dropout(0.5)(lstm_layer1)
flat_layer = layers.Flatten()(lstm_layer1)
output_layer = layers.Dense(33, activation="sigmoid")(flat_layer)
model = models.Model(inputs=input_layer, outputs=output_layer)
model.compile(optimizer=optimizers.Adam(), loss='categorical_crossentropy',metrics=['accuracy'])
Please help me out on the below questions:
Q1) Why did we pass word embedding vector(300 dim's) as weights in LSTM embedding layer?
Q2) How can we know optimal number of neural in LSTM layer?
Q3) Can you please explain how the single record processing in LSTM algorithm?
Please let me know if you requires more information on the same.
Q1) Why did we pass word embedding vector(300 dim's) as weights in
LSTM embedding layer?
In a very simplistic way, you can think of an embedding layers as a lookup table which converts a word (represented by its index in a dictionary) to a vector. It is a trainable layers. Since you have already trained word embeddings instead of initializing the embedding layer with the random weight you initialize it with the vectors you have learned.
Embedding(len(word_index)+1, 300, weights=[embedding_matrix], trainable=False)(input_layer)
So here you are
creating an embedding layer or a look up table which can lookup words
indices 0 to len(word_index).
Each lookuped up word will map to a vector of size 300.
This lookup table is loaded with the vectors from "embedding_matrix"
(which is a pretrained model).
trainable=False will freez the weight in this layer.
You have passed 300 because it is the vector size of your pretrained model (embedding_matrix)
Q2) How can we know optimal number of neural in LSTM layer?
You have created a LSTM layer with takes 300 size vector as input and returns a vector of size 300. The output size and number of stacked LSTMS are hyperparameters which is tuned manually (usually using KFold CV)
Q3) Can you please explain how the single record processing in LSTM
algorithm?
A single record/sentence(s) are converted into indices of the vocabulary. So for every sentence you have an array of indices.
A batch of these sentences are created and feed as input to the model.
LSTM is unwrapped by passing in one index at a time as input at each timestep.
Finally the ouput of the LSTM is forward propagated by a final dense
layer to size 33. So looks like each input is mapped to one of 33
classes in your case.
Simple example
import numpy as np
from keras.preprocessing.text import one_hot
from keras.preprocessing.sequence import pad_sequences
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Flatten, LSTM
from keras.layers.embeddings import Embedding
from nltk.lm import Vocabulary
from keras.utils import to_categorical
training_data = [ "it was a good movie".split(), "it was a bad movie".split()]
training_target = [1, 0]
v = Vocabulary([word for s in training_data for word in s])
model = Sequential()
model.add(Embedding(len(v),50,input_length = 5, dropout = 0.2))
model.add(LSTM(10, dropout_U = 0.2, dropout_W = 0.2))
model.add(Dense(2,activation='softmax'))
model.compile(loss = 'categorical_crossentropy', optimizer='adam',metrics = ['accuracy'])
print(model.summary())
x = np.array([list(map(lambda x: v[x], s)) for s in training_data])
y = to_categorical(training_target)
model.fit(x,y)

Implementing a many-to-many LSTM in TensorFlow?

I am using TensorFlow to make predictions on time-series data. So it is like I have 50 tags and I want to find out the next possible 5 tags.
As shown in the following picture, I want to make it like the 4th structure.
I went through the tutorial demo: Recurrent Neural Networks
But I found it can provide like the 5th one in the above picture, which is different.
I am wondering which model could I use? I am thinking of the seq2seq models, but not sure if it is the right way.
You are right that you can use a seq2seq model. For brevity I've written up an example of how you can do it in Keras which also has a Tensorflow backend. I've not run the example so it might need tweaking. If your tags are one-hot you need to use cross-entropy loss instead.
from keras.models import Model
from keras.layers import Input, LSTM, RepeatVector
# The input shape is your sequence length and your token embedding size
inputs = Input(shape=(seq_len, embedding_size))
# Build a RNN encoder
encoder = LSTM(128, return_sequences=False)(inputs)
# Repeat the encoding for every input to the decoder
encoding_repeat = RepeatVector(5)(encoder)
# Pass your (5, 128) encoding to the decoder
decoder = LSTM(128, return_sequences=True)(encoding_repeat)
# Output each timestep into a fully connected layer
sequence_prediction = TimeDistributed(Dense(1, activation='linear'))(decoder)
model = Model(inputs, sequence_prediction)
model.compile('adam', 'mse') # Or categorical_crossentropy
model.fit(X_train, y_train)