Why Keras raises shape error in last dense layer? - tensorflow

I built a simple NN to distinguish integers from decimals, my input data is 1 dimensional array,and the final output should be the probability of integer.
At first, I succeeded when last layer(name:output) had 1 unit. But it raised ValueError when I changed the last dense layer to two units,for I wanted to output both probabilities of number x as integer and decimal.
from tensorflow.python.keras.models import Sequential,load_model
from tensorflow.python.keras.utils import np_utils
from tensorflow.python.keras.layers import Dense
from tensorflow.python.keras.layers import Activation
from tensorflow import keras
import numpy as np
import tensorflow as tf
from sklearn.utils import shuffle
def train():
t=[]
a=[]
for i in range (0,8000): #generate some training data
ran=np.random.randint(2)
if(ran==0):
y=np.random.uniform(-100,100)
t.append(y)
a.append(0)
else:
y=np.random.randint(1000)
t.append(y)
a.append(1)
t=np.asarray(t)
a=np.asarray(a)
pt=t.reshape(-1,1) #reshape for fit()
pa=a.reshape(-1,1)
pt,pa=shuffle(pt,pa)
model=Sequential()
dense=Dense(units=32,input_shape=(1,),activation='relu')
dense2=Dense(units=64,activation='relu')
output=Dense(units=2,activation='softmax') # HERE is the problem
model.add(dense)
model.add(dense2)
model.add(output)
model.summary()
model.compile(optimizer='adam',loss='binary_crossentropy',metrics=['accuracy'])
model.fit(pt,pa,validation_split=0.02,batch_size=10, epochs=50, verbose=2)
model.save('integer_predictor.h5')
train()
ValueError: Error when checking target: expected dense_2 to have shape (2,) but got array with shape (1,)

This should solve your problem
model.compile(optimizer='adam',loss='sparse_categorical_crossentropy',metrics=['accuracy'])
Since you have 2 outputs, you cant use binary cross_entropy since its a 2 class classification problem. Also, when your inputs are not one-hot encoded you will need sparse_categorical_crossentropy. If you have one hot features then categorical_crossentropy will work with outputs > 1.
Read this to get more insight into this.

Related

Using Sparse Tensors as Input for Autoencoders

I have an One-hot-encoded sparse matrix which can't be transformed into a normal matrix due to its size.
I would like to reduce the dimensions using an autoencoder. Currently I am trying to use Tensorflow and its Keras library for that.
The Tensorflow docs state that sparse tensors exist and that they can be used in Keras (see https://www.tensorflow.org/guide/sparse_tensor).
The Problem is that all autoencoders I've found in the internet do not seem to work with sparse tensors.
I have prepared a small code example which stops after the first training epoch with the error message: "Failed to convert elements of SparseTensor to Tensor. Consider casting elements to a supported type.".
My Questions would be:
Do you have an idea to improve the Code or ideally do you have an example which I can look up?
If not: Do you have other ideas on how to do what I would like to do (e.g. another library, other method, etc.)?
Code Example:
#necessary imports
import tensorflow as tf
from keras.models import Model, Sequential
from keras.layers import Input, Dense, ActivityRegularization
from tensorflow.keras import backend as K
from tensorflow.keras import regularizers
#example one-hot-encoded matrix with 10 records with each one out of 4 distinct categories
sparse_tensor = tf.sparse.SparseTensor(indices=[[0,3], [1,3], [2,0], [3,1], [4,0], [5,2], [6,2], [7,1], [8,3], [9,1]],
values=[1 for i in range(10)],
dense_shape=[10, 4])
encoder = Sequential([
Input(shape=(4,), sparse=True),
Dense(1, activation = 'relu'),
ActivityRegularization(l1=1e-3)
])
decoder = Sequential([
Dense(4, activation = 'sigmoid', input_shape = (1, )),
])
autoencoder = Sequential([encoder, decoder])
autoencoder.compile(optimizer='adam', loss='binary_crossentropy')
autoencoder.fit(x=sparse_tensor, y=sparse_tensor, epochs=5, batch_size=5, shuffle=True)

Tries to understand Tensorflow input_shape

I have some confusions regarding to Tensorflow input_shape.
Suppose there are 3 documents (each row) in "doc" defined below, and the vocabulary has 4 words (each sublist in each row).
Further suppose that each word is represented by 2 numbers via word embedding.
The program only works when I specify input_shape=(3,4,2) under a Dense layer.
But when I use a LSTM layer, the program only works when input_shape=(4,2) but not when input_shape=(3,4,2).
So how to specify the input shape for such inputs? How to make sense of it?
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.metrics import categorical_crossentropy
doc=[
[[1,0],[0,0],[0,0],[0,0]],
[[0,0],[1,0],[0,0],[0,0]],
[[0,0],[0,0],[1,0],[0,0]]
]
model=Sequential()
model.add(Dense(2,input_shape=(3,4,2))) # model.add(LSTM(2,input_shape=(4,2)))
model.compile(optimizer=Adam(learning_rate=0.0001),loss="sparse_categorical_crossentropy",metrics=("accuracy"))
model.summary()
output=model.predict(doc)
print(model.weights)
print(output)
The input_shape argument in a keras.layers.LTSM layer expects a 2D array with a shape of [timesteps, features]. Your doc has the shape [batch_size, timesteps, features] and therefore one dimension too much.
You can use the batch_input_shape argument instead, if you want feed batch_size, too.
To do so, you have just to replace this line of your code:
model.add(LSTM(2,input_shape=(4,2)))
With this one:
model.add(LSTM(2,batch_input_shape=(3,4,2)))
If you're setting a specific batch_size in your model and then feed a different size other than 3 (in your case), you will get an error. Using input_shape instead you have the flexibility to feed any batch size to the network.

Problem with shapes of experimental Tensorflow dataset

I am trying to store numpy arrays in a Tensorflow dataset. The model fits correctly when using the numpy arrays as train and test data but not when I store the numpy arrays in a single Tensorflow dataset. The problem is with the dimensions of the dataset. Something is wrong even though shapes seem OK at first sight.
After trying multiple things to reshape my Tensorflow dataset, I am still unable to get it working. My code is the following:
train_x.shape
Out[54]: (7200, 40)
train_y.shape
Out[55]: (7200,)
dataset = tf.data.Dataset.from_tensor_slices((x,y))
print(dataset)
Out[56]: <TensorSliceDataset shapes: ((40,), ()), types: (tf.int32, tf.int32)>
model.compile(optimizer=optimizer, loss='sparse_categorical_crossentropy')
history = model.fit(dataset, epochs=EPOCHS, batch_size=256)
sparse_softmax_cross_entropy_with_logits
logits.get_shape()))
ValueError: Shape mismatch: The shape of labels (received (1,)) should equal the shape of logits except for the last dimension (received (40, 1351)).
I have seen this answer but I am sure it doesn't apply here. I must use sparse_categorical_crossentropy. I am inspiring myself from this example where I want to store the train and test data in a Tensorflow dataset. I also want to store the arrays in a dataset as I will have to use it later.
You can't use batch_size with model.fit() when using a tf.data.Dataset. Instead use tf.data.Dataset.batch(). You'll have to change your code as follows for it to work.
import numpy as np
import tensorflow as tf
# Some toy data
train_x = np.random.normal(size=(7200, 40))
train_y = np.random.choice([0,1,2], size=(7200))
dataset = tf.data.Dataset.from_tensor_slices((train_x,train_y))
dataset = dataset.batch(256)
#### - Define your model here - ####
model.compile(optimizer=optimizer, loss='sparse_categorical_crossentropy')
history = model.fit(dataset, epochs=EPOCHS)

Learning a Categorical Variable with TensorFlow Probability

I would like to use TFP to write a neural network where the output are the probabilities of a categorical variable with 3 classes, and train it using the negative log-likelihood.
As I'm moving my first steps with TF and TFP, I started with a toy model where the input layer has only 1 unit receiving a null input, and the output layer has 3 units with softmax activation function. The idea is that the biases should learn (up to an additive constant) the log of the probabilities.
Here below is my code, true_p are the true parameters I use to generate the data and I would like to learn, while learned_p is what I get from the NN.
import numpy as np
import tensorflow as tf
from tensorflow import keras
from functions import nll
from tensorflow.keras.optimizers import SGD
import tensorflow.keras.layers as layers
import tensorflow_probability as tfp
tfd = tfp.distributions
# params
true_p = np.array([0.1, 0.7, 0.2])
n_train = 1000
# training data
x_train = np.array(np.zeros(n_train)).reshape((n_train,))
y_train = np.array(np.random.choice(len(true_p), size=n_train, p=true_p)).reshape((n_train,))
# model
input_layer = layers.Input(shape=(1,))
p_layer = layers.Dense(len(true_p), activation=tf.nn.softmax)(input_layer)
p_y = tfp.layers.DistributionLambda(tfd.Categorical)(p_layer)
model_p = keras.models.Model(inputs=input_layer, outputs=p_y)
model_p.compile(SGD(), loss=nll)
# training
hist_p = model_p.fit(x=x_train, y=y_train, batch_size=100, epochs=3000, verbose=0)
# check result
learned_p = np.round(model_p.layers[1].call(tf.constant([0], shape=(1, 1))).numpy(), 3)
learned_p
With this setup, I get the result:
>>> learned_p
array([[0.005, 0.989, 0.006]], dtype=float32)
I over-estimate the second category, and can't really distinguish between the first and the third one. What's worst, if I plot the probabilities at the end of each epoch, it looks like they are converging monotonically to the vector [0,1,0], which doesn't make sense (it seems to me the gradient should push in the opposite direction once I start to over-estimate).
I really can't figure out what's going on here, but have the feeling I'm doing something plain wrong. Any idea? Thank you for your help!
For the record, I also tried using other optimizers like Adam or Adagrad playing with the hyper-params, but with no luck.
I'm using Python 3.7.9, TensorFlow 2.3.1 and TensorFlow probability 0.11.1
I believe the default argument to Categorical is not the vector of probabilities, but the vector of logits (values you'd take softmax of to get probabilities). This is to help maintain precision in internal Categorical computations like log_prob. I think you can simply eliminate the softmax activation function and it should work. Please update if it doesn't!
EDIT: alternatively you can replace the tfd.Categorical with
lambda p: tfd.Categorical(probs=p)
but you'll lose the aforementioned precision gains. Just wanted to clarify that passing probs is an option, just not the default.

Getting TypeError while training a classifier for iris flower dataset

I am trying to experiment by taking the output layer as a linear layer for classifying the iris flower dataset and use regression ,with target values
ranging from 0,1 and 2.
I am using 1 hidden tanh activation layer and the another linear layer. I have by motive tried using this instead of one hot encoding for the labels as I want to compare the score from the 'model' function of my code as I am new to tensorflow .On running below code...
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
import tensorflow as tf
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
data=load_iris()
X=data['data']
Y=data['target']
pca=PCA(n_components=2)
X=pca.fit_transform(X)
#visualise the data
#plt.figure(figsize=(12,12))
#plt.scatter(X[:,0],X[:,1],c=Y,alpha=0.4)
#plt.show()
labels=Y.reshape(-1,1)
x_train,x_test,y_train,y_test=train_test_split(X,Y,test_size=0.3,random_state=42)
y_train=y_train.reshape(-1,1)
y_test=y_test.reshape(-1,1)
hidden_nodes=5
batch_size=100
num_features=2
lr=0.01
g=tf.Graph()
with g.as_default():
tf_train_dataset=tf.placeholder(tf.float32,shape=[None,num_features])
tf_train_labels=tf.placeholder(tf.float32,shape=[None,1])
tf_test_dataset=tf.constant(x_test,dtype=tf.float32)
layer1_weights=tf.Variable(tf.truncated_normal([num_features,hidden_nodes]),dtype=tf.float32)
layer1_biases=tf.Variable(tf.zeros([hidden_nodes]),dtype=tf.float32)
layer2_weights=tf.Variable(tf.truncated_normal([hidden_nodes,1]),dtype=tf.float32)
layer2_biases=tf.Variable(tf.zeros([1]),dtype=tf.float32)
def model(data):
Z1=tf.matmul(data,layer1_weights)+layer1_biases
A1=tf.nn.relu(Z1)
Z2=tf.matmul(A1,layer2_weights)+layer2_biases
return Z2
model_scores=model(tf_train_dataset)
loss=tf.reduce_mean(tf.losses.mean_squared_error(model_scores,tf_train_labels))
optimizer=tf.train.GradientDescentOptimizer(lr).minimize(loss)
#train_prediction=model_scores
test_prediction=(tf_test_dataset)
num_steps=10001
with tf.Session() as sess:
init=tf.global_variables_initializer()
sess.run(init)
for step in range(num_steps):
offset=(step*batch_size)%(y_train.shape[0]-batch_size)
minibatch_data=x_train[offset:(offset+batch_size),:]
minibatch_labels=y_train[offset:(offset+batch_size)]
feed_dict={tf_train_dataset:minibatch_data,tf_train_labels:minibatch_labels}
ll,loss,scores=sess.run([optimizer,loss,model_scores],feed_dict=feed_dict)
if step%1000==0:
print('Minibatch loss at step {}:{}'.format(step,loss))
I get an error on line
ll,loss,scores=sess.run([optimizer,loss,model_scores],feed_dict=feed_dict)
TypeError: Fetch argument 14.686994 has invalid type , must be a string or Tensor. (Can not convert a float32 into a Tensor or Operation.)
Why is error coming, is it because of this line
model_scores=model(tf_train_dataset)
How should I go about solving this issue and can't the return value of model function be tensor or casted to tensor.
Thanks.
That is because of this line:
ll,loss,scores=sess.run([optimizer,loss,model_scores],feed_dict=feed_dict)
You replace loss tensor with loss value returned by sess.run. Just use a different variable to store loss value.