Categorical crossentropy and label encoding - tensorflow

I'm trying to code multiclass output and classes are ['A','B','C','D','E','F','G'].
Could someone elaborate more next error message:
"ValueError: You are passing a target array of shape (79, 1) while using as loss categorical_crossentropy. categorical_crossentropy expects targets to be binary matrices (1s and 0s) of shape (samples, classes). If your targets are integer classes, you can convert them to the expected format via:
from keras.utils.np_utils import to_categorical
y_binary = to_categorical(y_int)
Alternatively, you can use the loss function sparse_categorical_crossentropy instead, which does expect integer targets."
My code:
# Part 1 - Data Preprocessing
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataa = pd.read_csv('test_out.csv')
XX = dataa.iloc[:, 0:4].values
yy = dataa.iloc[:, 4].values
# Encoding categorical data
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_Y_1 = LabelEncoder()
yy = labelencoder_Y_1.fit_transform(yy)
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(XX, yy, test_size = 0.2,
random_state = 0)
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
# Part 2 - Now let's make the ANN!
# Importing the Keras libraries and packages
import keras
from keras.models import Sequential
from keras.layers import Dense
# Initialising the ANN
classifier = Sequential()
# Adding the input layer and the first hidden layer
classifier.add(Dense(output_dim = 6, init = 'uniform', activation = 'relu',
input_dim = 4))
# Adding the second hidden layer
classifier.add(Dense(output_dim = 6, init = 'uniform', activation = 'relu'))
# Adding the output layer
classifier.add(Dense(output_dim = 1, init = 'uniform', activation =
# Compiling the ANN
classifier.compile(optimizer = 'adam', loss = 'categorical_crossentropy',
metrics = ['accuracy'])
# Fitting the ANN to the Training set, y_train, batch_size = 10, nb_epoch = 50)
# Part 3 - Making the predictions and evaluating the model
# Predicting the Test set results
y_pred = classifier.predict(X_test)
y_pred = (y_pred > 0.5)
# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)

The problem lies in this portion of your code,
# Encoding categorical data
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_Y_1 = LabelEncoder()
yy = labelencoder_Y_1.fit_transform(yy)
You forgot to one-hot encode the yy, please take note that LabelEncoder only transforms your categorical data to numerical one, i.e. [A, B, C, D, E, F, G] to [1, 2, 3, 4, 5, 6, 7]. You have to one-hot encode it since you want to use softmax activation, and categorical_crossentropy (I'm over-simplifying, but it's the gist).
So, it should have been like this,
# Encoding categorical data
from keras.utils import to_categorical
from sklearn.preprocessing import LabelEncoder
labelencoder_Y_1 = LabelEncoder()
yy = labelencoder_Y_1.fit_transform(yy)
yy = to_categorical(yy)

I assume your target class that you are going to predict is binary i.e there are only 2 possible values that could occur
If your target is binary then, the last layer of the model should be activated with sigmoid activation function. Also, the model should be compiled with binary_crossentropy or sparse_categorical_crossentropy.
If the target is multi-class i.e more than 2 possible values, you must convert the target to categorical with the help of to_categorical from keras. Then you should compile your model with categorical_crossentropy and the last layer in the model should be activated with softmax activation function.!!


Getting Value Error while creating RNN model

Im getting this error: ValueError: logits and labels must have the same shape, received ((32, 1) vs (32, 23740))
This is my full code:
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense, Embedding, LSTM
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score, f1_score
import re
import numpy as np
from sklearn.preprocessing import StandardScaler, OneHotEncoder, LabelEncoder
from keras.preprocessing.text import Tokenizer
from keras.utils import to_categorical
from keras.utils import pad_sequences
# All data processing stuff
data = pd.read_csv('test.csv') # Load the data
# convert Sentiment type to string
data['Sentiment'] = data['Sentiment'].astype(str)
# remove special characters and convert to lowercase
data["Tweet"] = data["Tweet"].apply(lambda x: x.lower()) # Convert all tweets to lowercase
data["Tweet"] = data["Tweet"].apply(lambda x: x.replace("[^a-zA-Z0-9]", "")) # Remove special characters
data["Sentiment"] = data["Sentiment"].apply(lambda x: x.lower())
# Initialize the Tokenizer
tokenizer = Tokenizer()
# Fit the Tokenizer on the text data
tokenizer.fit_on_texts(data["Tweet"] + data["Sentiment"])
token_sequences1 = tokenizer.texts_to_matrix(data["Tweet"])
token_sequences2 = tokenizer.texts_to_matrix(data["Sentiment"])
padded_sequences1 = pad_sequences(token_sequences1)
padded_sequences2 = pad_sequences(token_sequences2)
# split the data into training and testing sets
x_train, x_test, y_train, y_test = train_test_split(
padded_sequences1, padded_sequences2, test_size=0.2, random_state=42)
# create the RNN model
model = Sequential()
# 1000 is the number of words in the vocabulary, 128 is the dimension of the embedding vector,
model.add(Embedding(1000, 128))
model.add(LSTM(128, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(1, activation='sigmoid'))
# compile the model with a specified loss function and optimizer
# binary_crossentropy is used for binary classification problems like this one (positive or negative sentiment)
# accuracy is the metric used to evaluate the model performance (the percentage of correct predictions)
# the loss function and the optimizer can be changed to see if the model performance improves or not
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# train the model on the training data and put verbose=1 to see the training progress, y_train, batch_size=32, epochs=10, verbose=1)
# evaluate the model on the testing data
y_pred = model.predict(x_test)
# calculate the evaluation metrics
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
# print the evaluation metrics
print("Accuracy: {:.2f}".format(accuracy))
print("Precision: {:.2f}".format(precision))
print("Recall: {:.2f}".format(recall))
print("F1 Score: {:.2f}".format(f1))
Im actually trying to create sentiment analyser on news headlines tweets from this data, here is a sample image of how my data looks in CSV.
Please provide a solution to this, I have tried every solution I had find

How to match dimensions in CNN

I'm trying to build a CNN, where the goal is from 3 features to predict the label, but is giving an error of dimension.
Could someone help me?
updated after comments from #M.Innat
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn.model_selection import train_test_split
from tensorflow.keras.layers import Dense, Conv2D, Dropout, Flatten, MaxPooling2D
from tensorflow.keras.models import Sequential, load_model
from sklearn.metrics import accuracy_score, f1_score, mean_absolute_error
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.optimizers import Adam
from sklearn import metrics
import tensorflow as tf
import random
# Create data
n = 8500
l = [2, 3, 4, 5,6]
k = int(np.ceil(n/len(l)))
labels = [item for item in l for i in range(k)]
labels =np.array(labels)
label_unique = np.unique(labels)
x = np.linspace(613000, 615000, num=n) + np.random.uniform(-5, 5, size=n)
y = np.linspace(7763800, 7765800, num=n) + np.random.uniform(-5, 5, size=n)
z = np.linspace(1230, 1260, num=n) + np.random.uniform(-5, 5, size=n)
X = np.column_stack((x,y,z))
Y = labels
# Split the dataset into training and testing.
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3, random_state=1234)
print('n_features: {} \n seq_len: {} \n exit_un: {}'.format(n_features,seq_len,exit_un))
X_train = X_train[..., None][None, ...] # add channel axis+batch aix
Y_train = pd.get_dummies(Y_train) # transform to one-hot encoded
drop_prob = 0.5
my_model = Sequential()
my_model.add(Conv2D(input_shape=(seq_len,n_features,1),filters=32,kernel_size=(3,3),padding='same',activation="relu")) # 1 channel of grayscale.
my_model.add(Conv2D(filters=64,kernel_size=(5,5), padding='same',activation="relu"))
my_model.add(Dense(units = 1024, activation="relu"))
my_model.add(Dense(units = exit_un, activation="softmax"))
n_epochs = 100
batch_size = 10
learn_rate = 0.005
# Define the optimizer and then compile.
my_model.compile(loss = "categorical_crossentropy", optimizer = my_optimizer, metrics=['categorical_crossentropy','accuracy'])
my_summary =, Y_train, epochs=n_epochs, batch_size = batch_size, verbose = 1)
The error I have is:
ValueError: Data cardinality is ambiguous:
x sizes: 1
y sizes: 5950
Make sure all arrays contain the same number of samples.
You're passing the input sample without the channel axis and also the batch axis. Also, according to your loss function, you should transform your integer label to one-hot encoded.
drop_prob = 0.5
X_train = X_train[..., None][None, ...] # add channel axis+batch aix
X_train = np.repeat(X_train, repeats=100, axis=0) # batch-ing
Y_train = np.repeat(Y_train, repeats=100, axis=0) # batch-ing
Y_train = pd.get_dummies(Y_train) # transform to one-hot encoded
print(X_train.shape, Y_train.shape)
my_model = Sequential()
Based on the discussion, it seems like you need the conv1d operation in the modeling time and need to reshape your sample as mentioned in the comment. Here is the colab, it should work now.

Strange results from a neural network build using Keras

I build an sentiment classifier using Keras to predict if a sentence has a sentiment score of 1, 2, 3, 4 or 5. However I am getting some strange results. I will first show my code:
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.callbacks import EarlyStopping
import pandas as pd
import numpy as np
# the data only reflects the structure of the actual data
# the real data has way larger text and more entries
X_train = ['i am glad i heard about that', 'that is one ugly bike']
y_train = pd.Series(np.array([1, 4])) # pandas series
X_test = ['that hurted me']
y_test = pd.Series(np.array([1, 4])) # pandas series
# tokenizing
tokenizer = Tokenizer(num_words = 5)
X_train_seq = tokenizer.texts_to_sequences(X_train)
X_test_seq = tokenizer.texts_to_sequences(X_test)
# performing some padding
padding_len = 4
X_train_seq_padded = pad_sequences(X_train_seq, maxlen = padding_len)
X_test_seq_padded = pad_sequences(X_test_seq, maxlen = padding_len)
# building the model
model = Sequential()
model.add(Dense(16, input_dim = padding_len, activation = 'relu', name = 'hidden-1'))
model.add(Dense(16, activation = 'relu', name = 'hidden-2'))
model.add(Dense(16, activation = 'relu', name = 'hidden-3'))
model.add(Dense(6, activation='softmax', name = 'output_layer'))
# compiling the model
model.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics=['accuracy'])
# training the model
callbacks = [EarlyStopping(monitor = 'accuracy', patience = 5, mode = 'max')]
my_model =, to_categorical(y_train), epochs = 100, batch_size = 1000, callbacks = callbacks, validation_data = (X_test, to_categorical(y_test)))
Using the actual data I keep getting results around 0.67xx (xx random numbers) which are reached after 1/2 epochs, no matter what changes to the code I introduce (and some are extreme).
I tried changing the padding to 1, 10, 100, 1000.
I tried removing the layer hidden-2 and hidden-3.
I tried adding stop word removal before tokenizing.
I tried using the tahn activation function in the hidden layers.
I used the sgd optimizer.
Example output of one setup:
Now my question is, is there something wrong with my code or are these actual possible results?

Getting non-brodcastable error in my LSTM

So, I have been trying to apply LSTM on this csv file CSV File that im trying to train
However, it seems to train it self but after the training, its causing issue on my test file with either
Error 1
Or if I modify it a little pit then I get another error which says "Value Error: cannot reshape array of size 1047835 into shape"
Here is the code im implementing:-
import math
import matplotlib.pyplot as plt
import keras
import pandas as pd
import numpy as np
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "-1" #Had to use CPU because of gpus capability was 3.0
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Dropout
from keras.layers import *
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import train_test_split
from keras.callbacks import EarlyStopping
for i in range(60,800):
X_Train = np.reshape(X_Train, (X_Train.shape[0], X_Train.shape[1], 1))
# print(X_train = np.reshape(X_Train, (X_Train.shape[0], X_Train.shape[1], 1)))
#(740, 60, 1)
model = Sequential()
#Adding the first LSTM layer and some Dropout regularisation
model.add(LSTM(units = 50, return_sequences = True, input_shape = (X_Train.shape[1], 1)))
# Adding a second LSTM layer and some Dropout regularisation
model.add(LSTM(units = 50, return_sequences = True))
# Adding a third LSTM layer and some Dropout regularisation
model.add(LSTM(units = 50, return_sequences = True))
# Adding a fourth LSTM layer and some Dropout regularisation
model.add(LSTM(units = 50))
# Adding the output layer
model.add(Dense(units = 1))
# Compiling the RNN
model.compile(optimizer = 'adam', loss = 'mean_squared_error')
# Fitting the RNN to the Training set, Y_Train, epochs = 100, batch_size = 32)
dataset_train = df.iloc[:800, 1:3]
dataset_test = df.iloc[800:, 1:3]
dataset_total = pd.concat((dataset_train, dataset_test), axis = 0)
inputs = dataset_total[len(dataset_total) - len(dataset_test) - 60:].values
inputs = inputs.reshape(-1,1)
inputs = scaler.transform(inputs)
X_Test = []
for i in range(60, 800):
X_Test.append(inputs[i-60:i, 0])
X_Test = np.array(X_Test)
X_Test = np.reshape(X_Test, (X_Test.shape[0], X_Test.shape[1], 1))
predicted_stock_price = model.predict(X_Test)
predicted_stock_price = scaler.inverse_transform(predicted_stock_price)
plt.plot(df.loc[800:, 'Date'],dataset_test.values, color = 'red', label = 'Real ASTL Stock Price')
plt.plot(df.loc[800:, 'Date'],predicted_stock_price, color = 'blue', label = 'Predicted ASTL Stock Price')
plt.title('ASTL Stock Price Prediction')
plt.ylabel('ASTL Stock Price')
You have a moment in your reshaping where you end up with a non-integer division. Take this example:
import numpy as np
data = np.zeros(3936)
out = data.reshape((-1,1,24,2))
works well because 3936/24/2 results in an integer, 82 .
But in this example
import numpy as np
data = np.zeros(34345)
out = data.reshape((-1,1,24,2))
you end up with the error message ValueError: cannot reshape array of size 34345 into shape (1,24,2) because the division does not result in an integer.
So, looping the way you do is bound to result in events of that type.

ValueError: Input 0 is incompatible with layer conv1d_1: expected ndim=3, found ndim=2

When I try to give Elmo embedding layer output to conv1d layer input it giving the error
ValueError: Input 0 is incompatible with layer conv1d_1: expected ndim=3, found ndim=2
I want to add a convolution layer from the output of the Elmo embedding layer
import tensorflow as tf
import tensorflow_hub as hub
import keras.backend as K
from keras import Model
from keras.layers import Input, Lambda, Conv1D, Flatten, Dense
from keras.utils import to_categorical
from sklearn.preprocessing import LabelEncoder
import pandas as pd
from sklearn.model_selection import train_test_split
df = pd.read_csv("/home/raju/Desktop/spam.csv", encoding='latin-1')
X = df['v2']
Y = df['v1']
le = LabelEncoder()
Y = le.transform(Y)
Y = to_categorical(Y)
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.25)
elmo = hub.Module('/home/raju/models/elmo')
def embeddings(x):
return elmo(tf.squeeze(tf.cast(x, dtype=tf.string)), signature='default', as_dict=True)['default']
input_layer = Input(shape=(1,), dtype=tf.string)
embed_layer = Lambda(embeddings, output_shape=(1024,))(input_layer)
conv_layer = Conv1D(4, 2, activation='relu')(embed_layer)
fcc_layer = Flatten()(conv_layer)
output_layer = Dense(2, activation='softmax')(fcc_layer)
model = Model(inputs=[input_layer], outputs=output_layer)
A Conv1D layer expects input of the shape (batch, steps, channels). The channels dimension is missing in your case, and you need to include it even if it is equal to 1. So the output shape of your elmo module should be (1024, 1) (this does not include the batch size). You can add a dimension to the output of the elmo module with tf.expand_dims(x, axis=-1).