I have create neuronetwork in Kerars, program is runing but there is problem of result, it is Forexforcast network in forcast it should return 0 or 1 , as provided in traing dataset but result is showing in between 0 and 1 in float like "[[0.47342286]]"
I have tried to use numpy athmax but it only result in 1 answer
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import datetime
from sklearn.preprocessing import MinMaxScaler
from ta import *
dataset = pd.read_csv('C:/Users/SIGMA COM/PycharmProjects/deep/GBP_JPY Historical Data.csv',index_col="Date",parse_dates=True)
dataset = dataset[::-1]
# initial value
step_size = 4
batch_sizes = 1
dataset['Diff'] = dataset['Open'] - dataset['Price']
dataset['Range'] = dataset['High'] - dataset['Low']
dataset['Rsi'] = rsi(close=dataset['Price'],n=4,fillna=True)
dataset['Macd'] = macd(close=dataset['Price'],n_fast=12,n_slow=26,fillna=True)
dataset['Cci'] = cci(high=dataset['High'],low=dataset['Low'],close=dataset['Price'],n=20,fillna=True)
# dataset['Rsi'] = dataset['Rsi'] /100.0
# # dataset['Macd'] = dataset['Macd'] /2.0
# dataset['Cci'] = dataset['Cci'] / 500.0
training_set = dataset[['Rsi','Macd','Cci','Price','Low','High','Open','Signal']]
sc = MinMaxScaler()
training_set_scaled = sc.fit_transform(training_set)
# Creating a data structure with 60 timesteps and 1 output
X_train = []
y_train = []
for i in range(60, 1258):
X_train.append(training_set_scaled[i-60:i, 0])
y_train.append(training_set_scaled[i, -1:])
X_train, y_train = np.array(X_train), np.array(y_train)
# Reshaping
X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], 1))
# Part 2 - Building the RNN
# Importing the Keras libraries and packages
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Dropout
print((X_train.shape[1], 1))
# Initialising the RNN
regressor = Sequential()
# Adding the first LSTM layer and some Dropout regularisation
regressor.add(LSTM(units = 50, return_sequences = True, input_shape = (X_train.shape[1], 1)))
# Adding a second LSTM layer and some Dropout regularisation
regressor.add(LSTM(units = 50, return_sequences = True))
# Adding a third LSTM layer and some Dropout regularisation
regressor.add(LSTM(units = 50, return_sequences = True))
# Adding a fourth LSTM layer and some Dropout regularisation
regressor.add(LSTM(units = 50))
# Adding the output layer
regressor.add(Dense(units = 1,activation='sigmoid'))
# Compiling the RNN
regressor.compile(optimizer = 'adam', loss = 'mean_squared_error')
# Fitting the RNN to the Training set, y_train, epochs = 10, batch_size = 32)
result = regressor.predict(np.reshape(X_train[100],(1,60,1)))
I want to make model to make predication in class 0 and 1

This behavior is expected, because the sigmoid function is going to return a number between zero and one, like so:
So if your class labels are either 0 or 1, which seems to be the case here, for a binary classification problem you can just round the resultant output for your class prediction. Let's make a distinction between a classification vs. a regression problem here: regression is like finding the "line of best fit;" that is, the model is being trained to approximate the data. This appears to be what you're doing here: you're minimizing the mean squared error and searching for the model that best approximates your data, but that doesn't make a prediction.
If you want to actually make a classification, you can just round all elements of the result of regressor.predict to 0 or 1, and then compare your predictions with the true labels. This can actually be done easily in numpy like so: numpy.around(your_predictions, decimals=0). Note the decimals argument is not strictly required since it defaults to a value of 0, it's nice for clarity.
As for using numpy.argmax (I'm going to assume that's what you meant by athmax since I can't find a function with that spelling), it will give you the same label for everything because it returns the index of the largest element in an array. Since your output array has length one (because it's simply a single neuron that calculates the logistic function), it will always return index zero! However, you're sort of on the right track: if your last layer was instead Dense(units=n_classes, activation='softmax') — softmax outputs a probability distribution that a particular row of data will produce each label. In that case, numpy.argmax is correct.
Here's a Tensorflow tutorial on classification that I found super helpful when I was just learning it myself. It uses softmax instead of sigmoid like you, but I think it's fairly adaptable to your needs:
Hope this helps!


Keras (tensorflow 2.6 backend) masks supposed to propagate through the network, as mentioned in the docs:
When using the Functional API or the Sequential API, a mask generated by an Embedding or Masking layer will be propagated through the network for any layer that is capable of using them.
Both Conv1D and GlobalMaxPool1D support masks but mask is not propagated, as demonstrated by the following example,
model1 applies mask and GlobalMaxPool1D
model2 applies mask and Conv1D
model3 applies mask and Conv1D followed by GlobalMaxPool1D
from tensorflow.keras import layers
from tensorflow.keras import models
from tensorflow.keras import initializers
import numpy as np
mask_val = 10.
inp = layers.Input(shape = (4,3))
masked = layers.Masking(mask_value = mask_val)(inp)
max_pool = layers.GlobalMaxPool1D()(masked)
model1 = keras.models.Model(inputs = inp, outputs = max_pool)
#--- initialize the conv kernel to -1. so it's easy to interpret output
conv = layers.Conv1D(1, 2, padding = 'valid', kernel_initializer = initializers.Constant(-1.))(masked)
model2 = keras.models.Model(inputs = inp, outputs = conv)
out = layers.GlobalMaxPool1D()(conv)
model3 = models.Model(inputs = inp, outputs = out)
Now test these 3 models on a simple input, 2 samples of dimension 3, followed by 2 masked samples:
x0 = np.concatenate((np.ones((1,2,3)), mask_val * np.ones((1,2,3))), axis = 1)
model1(x0) # outputs [1,1,1], as expected
model2(x0) # outputs [-6, -3, 0] as expected
model3(x0) # outputs [0], but should output [-3] as the 0 value should be masked
Am I missing something?
So, I was wrong thinking that Conv1D and GlobalMaxPool1D support masks. Turns out that:
layers.GlobalMaxPooling1D().supports_masking #--- this property is False
layers.Conv1D(1,1).supports_masking #--- this property is False
It seemed to support mask since the masking layer replaces the masked values by 0s, and then the outputs of both the Conv1D layer and the GlobalMaxPooling1D layer were not affected by these 0.
A different input (with -1 instead of 1 as the un-masked values) shows it:
x0 = np.concatenate((-np.ones((1,2,3)), mask_val * np.ones((1,2,3))), axis = 1)
l_masked = layers.Masking(mask_value = mask_val)
l_max_pool = layers.GlobalMaxPool1D()
l_max_pool(l_masked(x0)) #--- returns [0,0,0] - ignoring the mask :(

I build an sentiment classifier using Keras to predict if a sentence has a sentiment score of 1, 2, 3, 4 or 5. However I am getting some strange results. I will first show my code:
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.callbacks import EarlyStopping
import pandas as pd
import numpy as np
# the data only reflects the structure of the actual data
# the real data has way larger text and more entries
X_train = ['i am glad i heard about that', 'that is one ugly bike']
y_train = pd.Series(np.array([1, 4])) # pandas series
X_test = ['that hurted me']
y_test = pd.Series(np.array([1, 4])) # pandas series
# tokenizing
tokenizer = Tokenizer(num_words = 5)
X_train_seq = tokenizer.texts_to_sequences(X_train)
X_test_seq = tokenizer.texts_to_sequences(X_test)
# performing some padding
padding_len = 4
X_train_seq_padded = pad_sequences(X_train_seq, maxlen = padding_len)
X_test_seq_padded = pad_sequences(X_test_seq, maxlen = padding_len)
# building the model
model = Sequential()
model.add(Dense(16, input_dim = padding_len, activation = 'relu', name = 'hidden-1'))
model.add(Dense(16, activation = 'relu', name = 'hidden-2'))
model.add(Dense(16, activation = 'relu', name = 'hidden-3'))
model.add(Dense(6, activation='softmax', name = 'output_layer'))
# compiling the model
model.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics=['accuracy'])
# training the model
callbacks = [EarlyStopping(monitor = 'accuracy', patience = 5, mode = 'max')]
my_model =, to_categorical(y_train), epochs = 100, batch_size = 1000, callbacks = callbacks, validation_data = (X_test, to_categorical(y_test)))
Using the actual data I keep getting results around 0.67xx (xx random numbers) which are reached after 1/2 epochs, no matter what changes to the code I introduce (and some are extreme).
I tried changing the padding to 1, 10, 100, 1000.
I tried removing the layer hidden-2 and hidden-3.
I tried adding stop word removal before tokenizing.
I tried using the tahn activation function in the hidden layers.
I used the sgd optimizer.
Example output of one setup:
Now my question is, is there something wrong with my code or are these actual possible results?

I have a ResNet based siamese network which uses the idea that you try to minimize the l-2 distance between 2 images and then apply a sigmoid so that it gives you {0:'same',1:'different'} output and based on how far the prediction is, you just flow the gradients back to network but there is a problem that updation of gradients is too little as we're changing the distance between {0,1} so I thought of using the same architecture but based on Triplet Loss.
I1 = Input(shape=image_shape)
I2 = Input(shape=image_shape)
res_m_1 = ResNet50(include_top=False, weights='imagenet', input_tensor=I1, pooling='avg')
res_m_2 = ResNet50(include_top=False, weights='imagenet', input_tensor=I2, pooling='avg')
x1 = res_m_1.output
x2 = res_m_2.output
# x = Flatten()(x) or use this one if not using any pooling layer
distance = Lambda( lambda tensors : K.abs( tensors[0] - tensors[1] )) ([x1,x2] )
final_output = Dense(1,activation='sigmoid')(distance)
siamese_model = Model(inputs=[I1,I2], outputs=final_output)
So how can I change it to use the Triplet Loss function? What adjustments should be done here in order to get this done? One change will be that I'll have to calculate
res_m_3 = ResNet50(include_top=False, weights='imagenet', input_tensor=I2, pooling='avg')
x3 = res_m_3.output
One thing found in tf docs is triplet-semi-hard-loss and is given as:
As shown in the paper, the best results are from triplets known as "Semi-Hard". These are defined as triplets where the negative is farther from the anchor than the positive, but still produces a positive loss. To efficiently find these triplets we utilize online learning and only train from the Semi-Hard examples in each batch.
Another implementation of Triplet Loss which I found on Kaggle is: Triplet Loss Keras
Which one should I use and most importantly, HOW?
P.S: People also use something like: x = Lambda(lambda x: K.l2_normalize(x,axis=1))(x) after model.output. Why is that? What is this doing?
Following this answer of mine, and with role of TripletSemiHardLoss in mind, we could do following:
import tensorflow as tf
import tensorflow_addons as tfa
import tensorflow_datasets as tfds
from tensorflow.keras import models, layers
def _normalize_img(img, label):
img = tf.cast(img, tf.float32) / 255.
return (img, label)
train_dataset, test_dataset = tfds.load(name="mnist", split=['train', 'test'], as_supervised=True)
# Build your input pipelines
train_dataset = train_dataset.shuffle(1024).batch(BATCH_SIZE)
train_dataset =
test_dataset = test_dataset.batch(BATCH_SIZE)
test_dataset =
inputs = layers.Input(shape=(28, 28, 1))
resNet50 = tf.keras.applications.ResNet50(include_top=False, weights=None, input_tensor=inputs, pooling='avg')
outputs = layers.Dense(LATENT_DEM, activation=None)(resNet50.output) # No activation on final dense layer
outputs = layers.Lambda(lambda x: tf.math.l2_normalize(x, axis=1))(outputs) # L2 normalize embedding
siamese_model = models.Model(inputs=inputs, outputs=outputs)
# Compile the model
# Train the network
history =

I'm trying to code multiclass output and classes are ['A','B','C','D','E','F','G'].
Could someone elaborate more next error message:
"ValueError: You are passing a target array of shape (79, 1) while using as loss categorical_crossentropy. categorical_crossentropy expects targets to be binary matrices (1s and 0s) of shape (samples, classes). If your targets are integer classes, you can convert them to the expected format via:
from keras.utils.np_utils import to_categorical
y_binary = to_categorical(y_int)
Alternatively, you can use the loss function sparse_categorical_crossentropy instead, which does expect integer targets."
My code:
# Part 1 - Data Preprocessing
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataa = pd.read_csv('test_out.csv')
XX = dataa.iloc[:, 0:4].values
yy = dataa.iloc[:, 4].values
# Encoding categorical data
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_Y_1 = LabelEncoder()
yy = labelencoder_Y_1.fit_transform(yy)
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(XX, yy, test_size = 0.2,
random_state = 0)
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
# Part 2 - Now let's make the ANN!
# Importing the Keras libraries and packages
import keras
from keras.models import Sequential
from keras.layers import Dense
# Initialising the ANN
classifier = Sequential()
# Adding the input layer and the first hidden layer
classifier.add(Dense(output_dim = 6, init = 'uniform', activation = 'relu',
input_dim = 4))
# Adding the second hidden layer
classifier.add(Dense(output_dim = 6, init = 'uniform', activation = 'relu'))
# Adding the output layer
classifier.add(Dense(output_dim = 1, init = 'uniform', activation =
# Compiling the ANN
classifier.compile(optimizer = 'adam', loss = 'categorical_crossentropy',
metrics = ['accuracy'])
# Fitting the ANN to the Training set, y_train, batch_size = 10, nb_epoch = 50)
# Part 3 - Making the predictions and evaluating the model
# Predicting the Test set results
y_pred = classifier.predict(X_test)
y_pred = (y_pred > 0.5)
# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
The problem lies in this portion of your code,
# Encoding categorical data
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_Y_1 = LabelEncoder()
yy = labelencoder_Y_1.fit_transform(yy)
You forgot to one-hot encode the yy, please take note that LabelEncoder only transforms your categorical data to numerical one, i.e. [A, B, C, D, E, F, G] to [1, 2, 3, 4, 5, 6, 7]. You have to one-hot encode it since you want to use softmax activation, and categorical_crossentropy (I'm over-simplifying, but it's the gist).
So, it should have been like this,
# Encoding categorical data
from keras.utils import to_categorical
from sklearn.preprocessing import LabelEncoder
labelencoder_Y_1 = LabelEncoder()
yy = labelencoder_Y_1.fit_transform(yy)
yy = to_categorical(yy)
I assume your target class that you are going to predict is binary i.e there are only 2 possible values that could occur
If your target is binary then, the last layer of the model should be activated with sigmoid activation function. Also, the model should be compiled with binary_crossentropy or sparse_categorical_crossentropy.
If the target is multi-class i.e more than 2 possible values, you must convert the target to categorical with the help of to_categorical from keras. Then you should compile your model with categorical_crossentropy and the last layer in the model should be activated with softmax activation function.!!

I am currently training a CNN on MNIST, and the output probabilities (softmax) are giving [0.1,0.1,...,0.1] as training goes on. The initial values aren't uniform, so I can't figure out if I'm doing something stupid here?
I'm only training for 15 steps, just to see how training progresses; even though that's a low number, I don't think that should result in uniform predictions?
import numpy as np
import tensorflow as tf
import imageio
from sklearn.datasets import fetch_mldata
mnist = fetch_mldata('MNIST original')
# Getting data
from sklearn.model_selection import train_test_split
def one_hot_encode(data):
new_ = []
for i in range(len(data)):
_ = np.zeros([10],dtype=np.float32)
_[int(data[i])] = 1.0
return new_
data = np.asarray(mnist["data"],dtype=np.float32)
labels = np.asarray(mnist["target"],dtype=np.float32)
labels = one_hot_encode(labels)
tr_data,test_data,tr_labels,test_labels = train_test_split(data,labels,test_size = 0.1)
tr_data = np.asarray(tr_data)
tr_data = np.reshape(tr_data,[len(tr_data),28,28,1])
test_data = np.asarray(test_data)
test_data = np.reshape(test_data,[len(test_data),28,28,1])
tr_labels = np.asarray(tr_labels)
test_labels = np.asarray(test_labels)
def get_conv(x,shape):
weights = tf.Variable(tf.random_normal(shape,stddev=0.05))
biases = tf.Variable(tf.random_normal([shape[-1]],stddev=0.05))
conv = tf.nn.conv2d(x,weights,[1,1,1,1],padding="SAME")
return tf.nn.relu(tf.nn.bias_add(conv,biases))
def get_pool(x,shape):
return tf.nn.max_pool(x,ksize=shape,strides=shape,padding="SAME")
def get_fc(x,shape):
sh = x.get_shape().as_list()
dim = 1
for i in sh[1:]:
dim *= i
x = tf.reshape(x,[-1,dim])
weights = tf.Variable(tf.random_normal(shape,stddev=0.05))
return tf.nn.relu(tf.matmul(x,weights) + tf.Variable(tf.random_normal([shape[1]],stddev=0.05)))
#Creating model
x = tf.placeholder(tf.float32,shape=[None,28,28,1])
y = tf.placeholder(tf.float32,shape=[None,10])
conv1_1 = get_conv(x,[3,3,1,128])
conv1_2 = get_conv(conv1_1,[3,3,128,128])
pool1 = get_pool(conv1_2,[1,2,2,1])
conv2_1 = get_conv(pool1,[3,3,128,512])
conv2_2 = get_conv(conv2_1,[3,3,512,512])
pool2 = get_pool(conv2_2,[1,2,2,1])
conv3_1 = get_conv(pool2,[3,3,512,1024])
conv3_2 = get_conv(conv3_1,[3,3,1024,1024])
conv3_3 = get_conv(conv3_2,[3,3,1024,1024])
conv3_4 = get_conv(conv3_3,[3,3,1024,1024])
pool3 = get_pool(conv3_4,[1,3,3,1])
fc1 = get_fc(pool3,[9216,1024])
fc2 = get_fc(fc1,[1024,10])
softmax = tf.nn.softmax(fc2)
loss = tf.losses.softmax_cross_entropy(logits=fc2,onehot_labels=y)
train_step = tf.train.AdamOptimizer().minimize(loss)
sess = tf.Session()
for i in range(15):
indices = np.random.randint(len(tr_data),size=[200])
batch_data = tr_data[indices]
batch_labels = tr_labels[indices],feed_dict={x:batch_data,y:batch_labels})
Thank you so much.
There are several issues with your code, including elementary ones. I strongly suggest you first go through the Tensorflow step-by-step tutorials for MNIST, MNIST For ML Beginners and Deep MNIST for Experts.
In short, regarding your code:
First, your final layer fc2 should not have a ReLU activation.
Second, the way you build your batches, i.e.
indices = np.random.randint(len(tr_data),size=[200])
is by just grabbing random samples in each iteration, which is far from the correct way of doing so...
Third, the data you feed into the network are not normalized in [0, 1], as they should be:
np.max(tr_data[0]) # get the max value of your first training sample
# 255.0
The third point was initially puzzling for me, too, since in the aforementioned Tensorflow tutorials they don't seem to normalize the data either. But close inspection revealed the reason: if you import the MNIST data through the Tensorflow-provided utility functions (instead of the scikit-learn ones, as you do here), they come already normalized in [0, 1], something that is nowhere hinted at:
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
import numpy as np
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
# 0.99607849
This is an admittedly strange design decision - as far as I am aware of, in all other similar cases/tutorials normalizing the input data is an explicit part of the pipeline (see e.g. the Keras example), and with good reason (it is something you will be certainly expected to do yourself later, when using your own data).