How to prioritise certain output in MultiOutput LSTM Tensorflow? - tensorflow

Basically, I am creating an LSTM model with Tensorflow and the shape of my input data is something like
(10000 users, 6 timesteps, 20 feature columns) => (10000,6,20)
The model is doing a binary classification using LSTM with 20 output columns giving the shape of (10000, 20).
PS. I'm not doing classification with 20 classes, I'm doing a classification that gives 20 binary outputs for each person
Is it possible to prioritise certain output columns like giving weights or importance to certain columns more than others so that when we train the model it punishes incorrect predictions for these more important output columns more than others or would it make more sense to create separate models for these important columns?

It's easy to use class weights with TensorFlow for this purpose. See the class_weight parameter for model.fit(): https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit

Related

LSTM multivariate predicting multiple features

I am new to this neural networks and LSTM. I hope I will get a guidance from you and I will be thankful to you.
I have 2 years of bitcoin historical dataset and bitcoin sentiment dataset which is of one hour interval. My goal is to predict next 60 hours future chart using LSTM.
I have seen some of the articles regarding multivariate time series prediction. But in all of them they are taking only one feature for prediction. They predict only the price of one upcoming day and . So in order to predict next 2 months data, I have to predict all of the features. So that I can seed the predicted data as input for the next prediction and so on to predict for next 60 days.
Can someone help me to figure out how can I do this kind of prediction?
Edit:
The dataset looks like this:
timestamp,close,sentiment
2020-05-01_00,8842.85,0.21
2020-05-01_01,8824.43,0.2
2020-05-01_02,8745.91,0.2
2020-05-01_03,8639.12,0.19
2020-05-01_04,8625.69,0.2
And I would like to use tenserflow as backend. As of now i have not written code for building the model as I have to know what to do before i start coding.
The idea is to give 100 or 150 rows of data as input to the model and then forecast for the next 60 hours by seeding the prediction of the model as the input for the next prediction.
It would help if you shared some code for how you are constructing your model, and what your data looks like. How is your sentiment data encoded, and what framework you are using (tensorflow, pytorch, etc)? I am mostly familiar with Tensorflow, so I'll point you in that direction.
In general it can be helpful to use an Input Layer, but LSTMs expect a 3D tensor [batch, timestamps, feature].
You might want to consider a non-sequential model architecture, using functional APIs. If you went that route, you could have 2 separate inputs. One being the price time series, the other being the sentiment time series
pass each to an LSTM then you can concatenate/combine them and pass them to Dense layers or even convolutional layers.
Lastly you could also look into ConvLSTM2D which takes a 5D tensor:[samples, time, channels, rows, cols]
#------------------ Response (👇) post update: -------------------------
View the notebook here
#===========Design Model Architecture:
#==== Create Input Layers:
Price_Input = tf.keras.layers.Input(shape=(60,),name='Price_Input') #Price as Input
Sent_Input = tf.keras.layers.Input(shape=(60,),name='Sentiment_Input') #Sentiment as Input
#=== Handle Reshaping as part of the Model Architecture:
P_Input_rshp = tf.keras.layers.Reshape(target_shape=(60,1),
input_shape=(60,),
name='Price_Reshape')(Price_Input) #Pass price to reshape layer
S_Input_rshp = tf.keras.layers.Reshape(target_shape=(60,1),
input_shape=(60,),
name='Sentiment_Reshape')(Sent_Input) #Pass sentiment to rehape layer
#=== Use LSTM layers for timeseries:
P_x = tf.keras.layers.LSTM(units=1,activation='tanh',name='Price_LSTM')(P_Input_rshp) #Price Focused LSTM
S_x = tf.keras.layers.LSTM(units=1,activation='tanh',name='Sentiment_LSTM')(S_Input_rshp) #Sentiment Focused LSTM
C_x = tf.keras.layers.Concatenate(name='Concat')([P_x,S_x]) #Concatinate(join) inputs from each branch
Output = tf.keras.layers.Dense(units=1,name='Dense')(C_x) #Dense layer as model output to synthesize results
#============== Greate Model Graph:
model = tf.keras.Model(inputs=[Price_Input,Sent_Input],
outputs=Output,
name='Double_LSTM_Model')

How do I add an embedding layer in Keras starting from a pd dataframe?

I am trying to build a neural network using both categorical and numerical inputs using Keras to predict student grades ranging from 0-20.
My dataset is already split into train and test sets (two separate dataframes). I split the training set into numerical and categorical attributes. There are 17 categorical attributes and 16 numerical ones. Each categorical column only contains 3-4 categories so I have used OneHotEncoding to transform them. However, it creates unnecessary columns and I would like to experiment with embedding since it's more efficient.
I don't understand what I need to do in order to feed the categorical inputs into the neural model.
This is what my basic neural network looks like.
input = keras.layers.Input(shape= 58,) #additional columns created through OHE
hidden1 = keras.layers.Dense(300, activation="relu")(input)
hidden2 = keras.layers.Dense(300, activation="relu")(hidden1)
concat = keras.layers.Concatenate()([input,hidden2])
output = keras.layers.Dense(21, activation = "softmax")(concat) model = keras.Model(inputs=[input], outputs=[output])
How can I expand it to include an embedding layer? Can I embed all the categorical columns together or would I need to add a layer for each?
I am using sparse categorical crossentropy as my loss function, but I guess I could use a different one now that the categorical inputs have been vectorized?
model.compile(loss="sparse_categorical_crossentropy", optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3))
I am very new to ML and NNs so apologies if my question is unclear.

Which Loss function & Metrics is more suitable for multi-label classification? Binary or Categorical cross-entropy and Why?

According to my knowledge(please correct me if I'm wrong),
Multi-label classification(mutually inclusive) i.e., samples might have more than 1 correct values (for example movie genre, disease detection, etc).
Multi-Class classification(mutually exclusive) i.e., samples will always have 1 correct value (for example Cat or Dog, object detection, etc) this includes Binary Classification.
Assuming output is one-hot encoding.
What are the Loss function and metrics on has to use for these 2 types?
loss func. metrics
1. multi-label: (binary, categorical) (binary_accuracy, TopKCategorical accuracy, categorical_accuracy, AUC)
2. multi-class: (binary) (binary_accuracy,f1, recall, precision)
Please tell me from the above table which of them is/are more suitable, which of them is/are wrong & Why?
If you are trying to use multi-class classification provided that the labels (y) is one hot encoded, use the loss function as categorical crossentropy and use adam optimizer (It is suitable for most cases). Also, while using multi-class classification, the number of output nodes should be the same as the number of classes (or) labels. Say if your model is going to classify the input into 4 classes, You can configure the output layer as follows..
model.add(4, activation = "softmax")
Also, forgot to mention that softmax activation should be used in the output layer for multiclass classification problems.
Incase if your y is not one hot encoded, I would advise you to choose the loss function as sparse categorical crossentropy. No other changes will be necessary.
Also, I usually split the data into test data and train data and feed them to the model like this to get the accuracy in each epoch..
history = model.fit(train_data, validation_data = test_data, epochs = 10)
Hope it solved your problem.

Extract the output of the embedding layer

I am trying to build a regression model, for which I have a nominal variable with very high cardinality. I am trying to get the categorical embedding of the column.
Input:
df["nominal_column"]
Output:
the embeddings of the column.
I want to use the op of the embedding column alone since I would require that as a input to my traditional regression model. Is there a way to extract that output alone.
P.S I am not asking for code, any suggestion on the approach would be great.
If the embedding is part of the model and you train it, then you can use functional API of keras to get output of any intermediate operation in your graph:
x=Input((number_of_categories,))
y=Embedding(parameters_of_your_embeddings)(x)
output=Rest_of_your_model()(y)
model=Model(inputs=[x],outputs=[output,y])
if you do it before you train the model, you'll have to define custom loss function, that deals only with part of the output. The other way is to train the model with just one output, then create identical model with two outputs and set the weights of the second model from the trained one.
If you want to get the embedding matrix from your model, you can just use method get_weights of the embedding layer which returns the weights in numpy array.

Convolutional Neural Network Training

I have a question regarding convolutional neural network (CNN) training.
I have managed to train a network using tensorflow that takes an input image (1600 pixels) and output one of three classes that matches it.
Testing the network with variations of the trained classes is giving good results. However; when I give it a different -fourth- image (does not contain any of the trained 3 image), it always returns a random match to one of the classes.
My question is, how can I train a network to classify that the image does not belong to either of the three trained images? A similar example, if i trained a network against the mnist database and then a gave it the character "A" or "B". Is there a way to discriminate that the input does not belong to either of the classes?
Thank you
Your model will always make predictions like your labels, so for example if you train your model with MNIST data, when you will make predictions, prediction will always be 0-9 just like MNIST labels.
What you can do is train a different model first with 2 classes in which you will predict if an image belongs to data set A or B. E.x. for MNIST data you label all data as 1 and add data from other sources that are different (not 0-9) and label them as 0. Then train a model to find if image belongs to MNIST or not.
Convolutional Neural Network (CNN) predicts the result from the defined classes after training. CNN always return from one of the classes regardless of accuracy. I have faced similar problem, what you can do is to check for accuracy value. If the accuracy is below some threshold value then it's belong to none category. Hope this helps.
You probably have three output nodes, and choose the maximum value (one-hot encoding). That's a bit unfortunate as it's a low number of outputs. Non-recognized inputs tend to cause pretty random outputs.
Now, with 3 outputs, roughly speaking you can get 7 outcomes. You might get a single high value (3 possibilities) but non-recognized input can also cause 2 high outputs (also 3 possibilities) or approximately equal output (also 3 possibilities). So there's a decent chance (~ 3/7) of random inputs producing a pattern on the output nodes which you'd only expect for a recognized input.
Now, if you had 15 classes and thus 15 output nodes, you'd be looking at roughly 32767 possible outcomes for unrecognized inputs, only 15 of which correspond to expected one-hot outcomes.
Underlying this is a lack of training data. If your training set has examples outside the 3 classes, you can just dump this in a 4th "other" category and train with that. This by itself isn't a reliable indication, as usually the theoretical "other" set is huge, but you now have 2 complementary ways of detecting other inputs: either by the "other" output node or by one of the 11 ambiguous outputs.
Another solution would be to check what outcome your CNN usually gives when given something else. I believe the last layer must be softmax and your CNN should return probabilities of the three given classes. If none of these probabilities is close to 1 this might be a sign that this is something else assuming your CNN is well trained (it must be fined for overconfidence when predicting wrong labels).