How to use softmax in tensorflow - tensorflow

I made a TensorFlow sample which takes an integer and divides it by 5 and classified into the remainder.
https://colab.research.google.com/drive/1CQ5IKymDCuCzWNfgKQrZZSL3ifyzRJrA?usp=sharing
import numpy as np
from keras import models
from keras import optimizers
from keras import layers
from keras.utils import to_categorical
num_of_rows = 1500
num_of_classes = 5
X = np.abs(np.floor(np.random.randn(num_of_rows, 1)* 10000))
y = X % (num_of_classes)
X = X /100000
model = models.Sequential()
model.add(layers.Dense(25, input_dim=1))
model.add(layers.Dense(25, activation='relu'))
model.add(layers.Dense(25))
model.add(layers.Dense(num_of_classes, activation='softmax'))
model.compile(optimizer=optimizers.Adam(lr=0.001),loss='categorical_crossentropy',)
model.summary()
X_normal = X.astype('float')
print(X_normal.shape)
test_label = to_categorical(y,num_classes=num_of_classes)
print(y[:5])
model.fit(X_normal, test_label, epochs=100, batch_size=10)
The loss does not decrease when running with the above link.
Epoch 1/100
150/150 [==============================] - 0s 2ms/step - loss: 1.6094
Epoch 2/100
150/150 [==============================] - 0s 2ms/step - loss: 1.6102
Epoch 3/100
150/150 [==============================] - 0s 1ms/step - loss: 1.6091
Epoch 4/100
150/150 [==============================] - 0s 1ms/step - loss: 1.6090
Epoch 5/100
150/150 [==============================] - 0s 1ms/step - loss: 1.6089
Epoch 6/100
150/150 [==============================] - 0s 1ms/step - loss: 1.6089
Epoch 7/100
150/150 [==============================] - 0s 2ms/step - loss: 1.6089
Epoch 8/100
150/150 [==============================] - 0s 2ms/step - loss: 1.6090
Epoch 9/100
150/150 [==============================] - 0s 1ms/step - loss: 1.6091
Epoch 10/100
150/150 [==============================] - 0s 1ms/step - loss: 1.6084
I need advice on what else I need to do

After some trial and error, I found a solution.
the Input is just an integer and I think it is 1 dimension.
it looks like as follows
X = [100 , 15, 48, 17, 22]
I try the test input's shape as follow:
X = [[1,0,0] , [0,1,5], [0,4, 8] , [0,1,7] , [0,2,2]]
The loss is decreased as expected

Related

How to do a weighted combine of two tensors with trainable weighting?

I'd like to have a layer that takes two inputs of the same dimension, A and B and generates an output C = A + w*B where w is a trainable scalar. In my case, A and B are from a 1D CNN so have shape (batch, time, features). Is there a way to make an existing layer perform this function, or do I need to code up a custom layer?
Since you have a state, you want to use subclassing, heres a possible implementation:
class WeightedSum(K.layers.Layer):
def __init__(self, **kwargs):
super().__init__(**kwargs)
self.W = self.add_weight("W", shape=[1], dtype=tf.float32, trainable=True)
def call(self, inputs, *args, **kwargs):
A,B = inputs
return A + self.W*B
If you want to check that W is trainable, you can see it from here:
input = K.layers.Input(shape=10)
x1 = K.layers.Dense(10, trainable=False)(input)
x2 = K.layers.Dense(10, trainable=False)(input)
sum = WeightedSum()([x1, x2])
res = K.layers.Dense(1, trainable=False)(sum)
model = K.models.Model(inputs = [input], outputs = [res])
model.compile(K.optimizers.Adam(), loss="binary_crossentropy")
model.fit(np.random.random((100, 10)), np.random.choice([0,1], (100,)), epochs=10)
Output:
Epoch 1/10
4/4 [==============================] - 0s 9ms/step - loss: 6.5532
Epoch 2/10
4/4 [==============================] - 0s 5ms/step - loss: 6.5522
Epoch 3/10
4/4 [==============================] - 0s 5ms/step - loss: 6.5513
Epoch 4/10
4/4 [==============================] - 0s 5ms/step - loss: 6.5504
Epoch 5/10
4/4 [==============================] - 0s 4ms/step - loss: 6.5500
Epoch 6/10
4/4 [==============================] - 0s 5ms/step - loss: 6.5491
Epoch 7/10
4/4 [==============================] - 0s 5ms/step - loss: 6.5484
Epoch 8/10
4/4 [==============================] - 0s 5ms/step - loss: 6.5479
Epoch 9/10
4/4 [==============================] - 0s 5ms/step - loss: 6.5472
Epoch 10/10
4/4 [==============================] - 0s 5ms/step - loss: 6.5466
and you see that even though all the other layers are not trainable, the loss decreases (since W is optimized)
About the fact that the 2 vector comes from a 1D conv, should not matter (until they have the same size, or a compatible one)

Regression Model Problems

What is wrong with this model? I am new to learning about this stuff. Did I compile the model incorrectly or is it the structure itself?
Here's what the code looked like:
Y-set was defined as a list of floats [54.7, 52.5, 51.4, 51.5, 50.5] and so was x [0, 1.5, 2, 2.5, 3.5]
Here's the code and the results for the training:
model = Sequential([
Dense(units = 1, input_shape = [1]),
Dense(units = 60, activation = 'relu'),
Dense(units = 1)])
model.compile(optimizer = 'sgd', loss = 'mean_squared_error', metrics = ['mae'])
model.fit(x_set, y_set, epochs = 10)
This was the output it gave me:
Epoch 1/10
1/1 [==============================] - 0s 8ms/step - loss: 1519.2493 - mae: 37.8005
Epoch 2/10
1/1 [==============================] - 0s 9ms/step - loss: 577948.8750 - mae: 674.4330
Epoch 3/10
1/1 [==============================] - 0s 8ms/step - loss: 159614431746567700480.0000 -
mae: 11284396032.0000
Epoch 4/10
1/1 [==============================] - 0s 8ms/step - loss: inf - mae: inf
Epoch 5/10
1/1 [==============================] - 0s 9ms/step - loss: nan - mae: nan
Epoch 6/10
1/1 [==============================] - 0s 8ms/step - loss: nan - mae: nan
Epoch 7/10
1/1 [==============================] - 0s 8ms/step - loss: nan - mae: nan
Epoch 8/10
1/1 [==============================] - 0s 9ms/step - loss: nan - mae: nan
Epoch 9/10
1/1 [==============================] - 0s 10ms/step - loss: nan - mae: nan
Epoch 10/10
1/1 [==============================] - 0s 8ms/step - loss: nan - mae: nan
Obviously, it does not work.
You need to use the activation function to the layers in model definition to fix this issue.
Please check this fixed code:
import numpy as np
import tensorflow as tf
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
x_set=np.array([54.7, 52.5, 51.4, 51.5, 50.5])
y_set=np.array([0, 1.5, 2, 2.5, 3.5])
model = Sequential([
Dense(units = 1 , input_shape = [1,], activation = 'relu'),
Dense(units = 60, activation = 'relu'),
Dense(units = 1, activation = 'sigmoid')])
model.compile(optimizer = 'sgd', loss = 'mean_squared_error', metrics = ['mae'])
model.fit(x_set, y_set, epochs = 10)
Output:
Epoch 1/10
1/1 [==============================] - 1s 1s/step - loss: 2.1748 - mae: 1.3081
Epoch 2/10
1/1 [==============================] - 0s 13ms/step - loss: 2.1692 - mae: 1.3063
Epoch 3/10
1/1 [==============================] - 0s 17ms/step - loss: 2.1656 - mae: 1.3051
Epoch 4/10
1/1 [==============================] - 0s 17ms/step - loss: 2.1632 - mae: 1.3043
Epoch 5/10
1/1 [==============================] - 0s 18ms/step - loss: 2.1614 - mae: 1.3037
Epoch 6/10
1/1 [==============================] - 0s 29ms/step - loss: 2.1600 - mae: 1.3033
Epoch 7/10
1/1 [==============================] - 0s 30ms/step - loss: 2.1589 - mae: 1.3029
Epoch 8/10
1/1 [==============================] - 0s 10ms/step - loss: 2.1581 - mae: 1.3027
Epoch 9/10
1/1 [==============================] - 0s 12ms/step - loss: 2.1573 - mae: 1.3024
Epoch 10/10
1/1 [==============================] - 0s 12ms/step - loss: 2.1567 - mae: 1.3022
<keras.callbacks.History at 0x7f3b865a75d0>

What the meaning of loss < 0

I am comparing two model the one uses binary_crossentropy(Model A) as optimizer, another uses mean_squared_error(Model B)
Model A)
self.seq_len = 2
in_out_neurons = 50
n_hidden = 500
model = Sequential()
model.add(LSTM(n_hidden, batch_input_shape=(None, self.seq_len, in_out_neurons), return_sequences=True))
model.add(Dense(in_out_neurons, activation="relu"))
optimizer = Adam(learning_rate=0.001)
#model.compile(loss="mean_squared_error", optimizer=optimizer)
model.compile(loss='binary_crossentropy', optimizer=optimizer)
Epoch 1/10
718/718 [==============================] - 32s 42ms/step - loss: -0.0633 - val_loss: -0.0649
Epoch 2/10
718/718 [==============================] - 33s 46ms/step - loss: -0.0632 - val_loss: -0.0572
Epoch 3/10
718/718 [==============================] - 43s 60ms/step - loss: -0.0592 - val_loss: -0.0570
Epoch 4/10
718/718 [==============================] - 51s 71ms/step - loss: -0.0522 - val_loss: -0.0431
Epoch 5/10
718/718 [==============================] - 50s 69ms/step - loss: -0.0566 - val_loss: -0.0535
Epoch 6/10
718/718 [==============================] - 49s 68ms/step - loss: -0.0567 - val_loss: -0.0537
Epoch 7/10
718/718 [==============================] - 48s 67ms/step - loss: -0.0627 - val_loss: -0.0499
Epoch 8/10
718/718 [==============================] - 51s 71ms/step - loss: -0.0621 - val_loss: -0.0614
Epoch 9/10
718/718 [==============================] - 47s 65ms/step - loss: -0.0645 - val_loss: -0.0653
Epoch 10/10
718/718 [==============================] - 43s 60ms/step - loss: -0.0661 - val_loss: -0.0622
Model B)
self.seq_len = 2
in_out_neurons = 50
n_hidden = 500
model = Sequential()
model.add(LSTM(n_hidden, batch_input_shape=(None, self.seq_len, in_out_neurons), return_sequences=True))
model.add(Dense(in_out_neurons, activation="relu"))
optimizer = Adam(learning_rate=0.001)
model.compile(loss="mean_squared_error", optimizer=optimizer)
#model.compile(loss='binary_crossentropy', optimizer=optimizer)
Epoch 1/10
718/718 [==============================] - 36s 48ms/step - loss: 0.0189 - val_loss: 0.0190
Epoch 2/10
718/718 [==============================] - 46s 64ms/step - loss: 0.0188 - val_loss: 0.0189
Epoch 3/10
718/718 [==============================] - 48s 67ms/step - loss: 0.0187 - val_loss: 0.0189
Epoch 4/10
718/718 [==============================] - 58s 81ms/step - loss: 0.0187 - val_loss: 0.0188
Epoch 5/10
718/718 [==============================] - 62s 87ms/step - loss: 0.0186 - val_loss: 0.0188
Epoch 6/10
718/718 [==============================] - 72s 100ms/step - loss: 0.0186 - val_loss: 0.0188
Epoch 7/10
718/718 [==============================] - 73s 102ms/step - loss: 0.0185 - val_loss: 0.0187
Epoch 8/10
718/718 [==============================] - 60s 84ms/step - loss: 0.0185 - val_loss: 0.0187
Epoch 9/10
718/718 [==============================] - 64s 89ms/step - loss: 0.0185 - val_loss: 0.0187
Epoch 10/10
718/718 [==============================] - 64s 89ms/step - loss: 0.0185 - val_loss: 0.0187
Model B's loss is more than 0 , so it could be understood.
However,Model A's loss is less than 0, what does it mean??
Cross entropy is calculated as minus expected value of logarithm of the result. Usually it is used after sigmoid or softmax activation, where all values <= 1, their logarithms <= 0, and thus the result is >= 0. But you use it after relu activation that can give values > 1, that's why you obtain the result < 0. The moral is that the output layer activation and the loss should correspond to each other and must make sense from the point of view of the task you are trying to solve. Otherwise you may obtain senseless results.

Binary Classification for binary dataset with DNN

I have a dataset of binary data like this:
age (0-9)
age (10-19)
age (20-59)
age (10-19)
gender (male)
gender (female)
...
desired (very much)
desired (moderate)
desired (little)
desired (None)
1
0
0
0
0
1
...
0
1
0
0
0
0
1
0
1
0
...
1
0
0
0
the features here are the first few columns, and the target is the final 4 columns.
I'm trying here to use DNN implemented with tensorflow/keras to fit on this data.
here's my model and code:
input_layer = Input(shape=(len(x_training)))
x = Dense(30,activation="relu")(input_layer)
x = Dense(20,activation="relu")(x)
x = Dense(10,activation="relu")(x)
x = Dense(5,activation="relu")(x)
output_layer = Dense(4,activation="softmax")(x)
model = Model(inputs=input_layer, outputs=output_layer)
model.compile(optimizer="sgd",
loss="categorical_crossentropy",
metrics=['accuracy'])
model.fit(x=x_train,
y=y_train,
batch_size=128,
epochs=10,
validation_data=(x_validate,y_validate))
and this is the history of the training:
Epoch 1/10
2005/2005 [==============================] - 9s 4ms/step - loss: 1.3864 - accuracy: 0.2525 - val_loss: 1.3863 - val_accuracy: 0.2533
Epoch 2/10
2005/2005 [==============================] - 6s 3ms/step - loss: 1.3863 - accuracy: 0.2518 - val_loss: 1.3864 - val_accuracy: 0.2486
Epoch 3/10
2005/2005 [==============================] - 6s 3ms/step - loss: 1.3863 - accuracy: 0.2499 - val_loss: 1.3863 - val_accuracy: 0.2487
Epoch 4/10
2005/2005 [==============================] - 6s 3ms/step - loss: 1.3863 - accuracy: 0.2515 - val_loss: 1.3863 - val_accuracy: 0.2539
Epoch 5/10
2005/2005 [==============================] - 6s 3ms/step - loss: 1.3863 - accuracy: 0.2511 - val_loss: 1.3863 - val_accuracy: 0.2504
Epoch 6/10
2005/2005 [==============================] - 6s 3ms/step - loss: 1.3863 - accuracy: 0.2501 - val_loss: 1.3863 - val_accuracy: 0.2484
Epoch 7/10
2005/2005 [==============================] - 6s 3ms/step - loss: 1.3863 - accuracy: 0.2511 - val_loss: 1.3863 - val_accuracy: 0.2468
Epoch 8/10
2005/2005 [==============================] - 6s 3ms/step - loss: 1.3863 - accuracy: 0.2509 - val_loss: 1.3863 - val_accuracy: 0.2519
Epoch 9/10
2005/2005 [==============================] - 6s 3ms/step - loss: 1.3863 - accuracy: 0.2505 - val_loss: 1.3863 - val_accuracy: 0.2463
Epoch 10/10
2005/2005 [==============================] - 6s 3ms/step - loss: 1.3863 - accuracy: 0.2512 - val_loss: 1.3863 - val_accuracy: 0.2474
<tensorflow.python.keras.callbacks.History at 0x7f6893c61e90>
The accuracy and the loss doesn't change at all, I have tried the following experiments, and all gave the same result:
changed hidden layers activation to sigmoid, tanh
changed the final layer to be only one node and the y_train to be labeled with (1,2,3) instead of one hot encoding, and change the loss function to be sparse categorical cross entropy
changed the optimizer to Adam
changed the data to be in (-1,1) instead of (0,1)
What am I missing here?
I figured out some method to solve this problem, I don't think it's very scientific but actually it worked for my case
First I replaced every "1" in the training dataset with "0.8" and every "0" with "0.2".
Then I multiplied every similar features in some weight. for example if the age is "18" then the features first will be like this [0,1,0,0], then apply the first step the feats will be like this [0.2,0.8,0.2,0.2] then multiply this array to [0.1,0.2,0.3,0.4] and sum them together to give [0.32] which represents the age "18" somehow.
then by applying the previous stages to the features, I got an array of length 15 instead of 22.
The third stage was to apply feature size reduction using PCA to reduce number of features to 10.
This method was like extracting some other features from the existing features by giving it a new domain instead of the binary domain.
This gave me accuracy about 85% which was very satisfying to me.

Issue with Tensorflow classification - loss not decreasing

Following is the reference code:
Xtrain2 = df2.iloc[:,:-1].values
ytrain2 = df2['L'].values.reshape((200,1))
print(Xtrain2.shape, ytrain2.shape)
#--------------------------
lrelu = lambda x: tf.keras.activations.relu(x, alpha=0.1)
model2 = tf.keras.models.Sequential([
tf.keras.layers.Dense(1501, input_dim=1501, activation='relu'),
tf.keras.layers.Dense(100, activation='relu'),
tf.keras.layers.Dense(100, activation='relu'),
#tf.keras.layers.Dense(1),
#tf.keras.layers.Dense(1, activation=lrelu)
tf.keras.layers.Dense(1, activation='sigmoid')
])
model2.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
#--------------------------
model2.fit(Xtrain2, ytrain2, epochs=50)#, verbose=0)
Just a simple attempt at a classifier. The last layer is sigmoid since it's just a binary classifier. The loss is also appropriate for the problem. Dimension of input is 1500 and the number of samples is 200. I get the following output:
(200, 1501) (200, 1)
Train on 200 samples
Epoch 1/50
200/200 [==============================] - 0s 2ms/sample - loss: 0.4201 - accuracy: 0.0300
Epoch 2/50
200/200 [==============================] - 0s 359us/sample - loss: -1.1114 - accuracy: 0.0000e+00
Epoch 3/50
200/200 [==============================] - 0s 339us/sample - loss: -4.6102 - accuracy: 0.0000e+00
Epoch 4/50
200/200 [==============================] - 0s 344us/sample - loss: -13.7864 - accuracy: 0.0000e+00
Epoch 5/50
200/200 [==============================] - 0s 342us/sample - loss: -34.7789 - accuracy: 0.0000e+00
.
.
.
Epoch 40/50
200/200 [==============================] - 0s 348us/sample - loss: -905166.4000 - accuracy: 0.3750
Epoch 41/50
200/200 [==============================] - 0s 344us/sample - loss: -1010177.5300 - accuracy: 0.3400
Epoch 42/50
200/200 [==============================] - 0s 354us/sample - loss: -1129819.1825 - accuracy: 0.3450
Epoch 43/50
200/200 [==============================] - 0s 379us/sample - loss: -1263355.3200 - accuracy: 0.3900
Epoch 44/50
200/200 [==============================] - 0s 359us/sample - loss: -1408803.0400 - accuracy: 0.3750
Epoch 45/50
200/200 [==============================] - 0s 355us/sample - loss: -1566850.5900 - accuracy: 0.3300
Epoch 46/50
200/200 [==============================] - 0s 359us/sample - loss: -1728280.7550 - accuracy: 0.3550
Epoch 47/50
200/200 [==============================] - 0s 354us/sample - loss: -1909759.2400 - accuracy: 0.3400
Epoch 48/50
200/200 [==============================] - 0s 379us/sample - loss: -2108889.7200 - accuracy: 0.3750
Epoch 49/50
200/200 [==============================] - 0s 369us/sample - loss: -2305491.9800 - accuracy: 0.3700
Epoch 50/50
200/200 [==============================] - 0s 374us/sample - loss: -2524282.6300 - accuracy: 0.3050
I don't see where I'm going wrong in the above code. Any help would be appreciated!